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I.  INTRODUCTION. 


The  Conference  on  Data  Systems  Lanquage  (CODASYL)  has 
defined  a  data  base  system  fRef.  2  and  3]  which  is  oartially 
generalized  and  oartially  tailored  to  COBOL.  This  system  is 
based  on  network  data  modeling  techniques.  CODASYL  has  made 
the  claim  that  the  system  could  have  other  languages  effec“ 
tively  interfaced  to  it  and  that  the  system  could  be  imple¬ 
mented  in  a  variety  of  environments.  The  Computer  Science 
Department  has  two  Digital  Eguioment  Corporation  POP  11/50 
computers  running  with  the  UNIX  ooerating  system  (Ref.  II. 
The  eouipment  was  acquired  for  research  in  signal  processing 
applications.  This  environment  is  one  in  which  a  CODASYL 
based  data  base  management  system  has  never  been  introduced. 

Relational  data  modeling  techniques  are  the  major  com¬ 
petitor  with  tt'e  CODASYL  system.  Recently^  development  was 
completed  on  a  relational  data  base  management  system  which 
runs  under  UNIX  (Ref.  ^  and  51.  This  system  is  called 
INGRESS  and  was  developed  at  the  University  of  California  at 
Berkeley.  Currently  a  discussion  is  taking  place  in  the 
literature  over  the  relative  merits  and  drawbacks  of  rela¬ 
tional  models  versus  network  models  (chiefly  the  CODASYL 
version)  (Refs.  6 >  7,  8  and  R] .  Although  much  has  been 
written  about  the  merits  of  each  models  relatively  little 


empirical  comparison  has  been  done 


Therefore/  since  steps 


are  being  taken  to  aquire  the  IMGRESS  software^  it  was 
decided  that  a  CODASYL  data  manaqement  system  would  orovide 
a  comolete  suite  of  data  manaaement  software. 

The  tasks  to  be  accomolished  were  design  and  imolemen- 
tation  of  a  UNIX  hosted  CODASYL  system,  design  and  implemen¬ 
tation  of  a  C  language  [Ref.  10]  interface  to  this  system, 
aquisition  of  INGRESS  and  comparative  studies  of  the  two 
systems  for  signal  processing  applications.  This  thesis 
documents  the  design  and  implementation  of  a  UNIX  hosted 
CODASYL  data  base  management  system  and  the  design  of  a  C 
language  interface  to  this  system. 


II 


BACKGROUND 


A.  Data  Access  Methods  -  A  History, 

1.  Technological  Effects. 

During  the  first  and  second  generations  of  comouter 
hardware/  data  storage  media  were  tapes/  relatively  slow 
disks  and  drums  and  the  omnipresent  punched  card.  Data 
storage  and  retrieval  concents  were  shaped  by  these  devices/ 
especially  the  punched  card.  A  file  was  therefore  a  seauen- 
tially  ordered  and  accessed/  contiguously  stored  group  of 
records.  All  the  records  were  of  Hxed  length.  This  view 
is  also  oriented  toward  a  monooroarammi ng  environment  with 
absolute  separation  of  one  user's  files  from  another's. 

With  the  advent  of  third  generation  technoloay/ 
several  factors  began  to  affect  data  storage  and  retrieval 
concepts.  Foremost  was  the  development  of  fast/  high  caoa- 
city/  relatively  inexpensive  direct  access  storage  devices. 
These  devices  stimulated  the  development  of  a  whole  range  of 
new  access  technigues  such  as  hash  coded  and  indexed  file 
organization.  Secondly/  the  multi-user  environment  caused  a 
breakdown  of  the  sharp  division  between  the  execution 
environment  of  users.  This  breakdown  was  accompanied  by  a 
rethinking  of  the  relationships  between  the  overlapping  data 
reguirements  of  users.  All  these  factors  led  to  the 
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development  of  a  new  kind  of  file  system  known  as  a  data 
base  • 


2*  Access  Methods. 

The  inspiration  for  a  multi-purpose  data  base  came 
from  management  systems  in  which  it  was  discovered  that  vast 
overlap  and  duplication  of  data  was  occurinq  between  dif¬ 
ferent  groups  in  a  company*  For  example  the  payroll  and 
personnel  sections  would  typically  each  have  employee  files 
which  were  stored  and  maintained  separately  but  which  over¬ 
lapped  by  80  Per  cent  in  data  content*  An  early  management 
information  system  (MTS)  which  attacked  this  problem  was 
IBM’s  Bill  of  Material  Program  (BOMP)  [Ref*  123  which 
allowed  the  structurina  of  a  parts  list  with  subassemblies 
each  having  its  own  parts  lists?  this  facilitated  the 
management  of  manu f ac t u r i nq  inventories*  When  the  subassem¬ 
blies  occured  in  many  different  parts#  the  savings  afforded 
through  avoided  data  duplication  were  significant*  The  BOMP 
used  a  relatively  flexible  list  structure  and  marked  a  sig¬ 
nificant  departure  from  traditional  file  organization* 

With  the  advent  of  the  consolidated  multi-purpose 
data  base#  a  whole  new  level  of  data  structuring  was  imposed 
on  the  techniques  for  physically  mapping  files  to  devices* 
These  data  structures  employed  relatively  complex  methods 
from  graph  theory  and  other  disiolines  which  had  previously 
been  used  only  on  relatively  small  amounts  of  data  residing 


in  main  storaae 


This  new  level  of  data  base  structure  combined  with 
the  fact  that  a  very  large  portion  of  the  data  base  is  typi¬ 
cally  on-line#  has  imposed  on  programmers  the  reauirement 
for  a  new  skill.  This  skill  has  been  called  navigation 
through  the  data  base  tRef,  13] •  In  this  view#  a  programmer 
must  travel  via  access  paths  through  the  data  base  searching 
for  landmarks  until  he  has  located  the  data  he  desires. 
Choosing  an  inappropriate  access  path  can  be  extraordinarily 
inefficient  and  costly  in  time#  so  the  Penalty  for  lack  of 
navigational  skill  is  high.  It  would  obviously  be  desirable 
to  remove  as  much  of  the  burden  for  navigation  from  the  pro* 
grammer  as  is  practical.  The  develooement  of  modern  data 
base  management  systems  (DBMS)  has  been  made  difficult  by 
the  dilemma  of  desiring  both  optimal  access  paths  and  ease 
and  simplicity  of  use  for  the  programmer, 

3 ,  Te  rm i no  1 ogy • 

This  section  will  attempt  to  define  the  terms  used 
in  this  field  of  inquiry.  The  following  definition  of  a  data 
base  is  due  to  Ref.  13: 

data  base  may  be  defined  as  a  collection  of 
interrelated  data  stored  together  with  as  little 
redundancy  as  possible  to  serve  one  or  more  applica¬ 
tions  in  an  optimal  fashion?  the  data  are  storec^^so 
that  they  are  independent  of  programs  which  use  the 
data#  a  common  and  controlled  approach  is  used  in 
adding  new  data  and  in  modifying  and  retrieving 


existing  data  within  the  data  base.  One  system  is 


said  to  contain  a  collection  of  data  bases  if  they 
are  entirely  seoarate  in  structure." 

Two  types  of  languaoes  are  mentioned  in  connection 
with  DBMS.  The  first  is  the  Data  Description  Language  fODL) 
which  describes  the  types  of  data  entities  which  may  exist 
along  with  the  allowable  attributes.  There  may  be  two  DDL's 
or  two  levels  of  DDL  for  describing  a  data  base.  The  first 
level  description  is  the  system's  view  of  the  data  base  as 
it  is  actually  organized  and  the  second/  a  user's  view  of 
the  data  base.  These  levels  are  called  the  schema  and  sub¬ 
subschema  respectively.  In  the  relational  model  terminol¬ 
ogy/  the  DDL  may  be  called  the  relational  algebra. 

The  second  language  is  the  Data  Manipulation 
Language  (DML)  which  is  concerned  with  the  storage/ 
retrieval  and  modification  of  specific  occurences  of  the 
entity  types  described  by  DDL  statements.  In  relational 
model  terminology/  this  language  corresponds  to  the  rela¬ 
tional  calculus.  The  entities  handled  by  DDL  and  DML  may  be 
records/  sets  or  anything  that  may  need  manipulation.  The 
attributes  may  be  such  thinas  as  data  items/  set  membership/ 
set  ownershio  or  location  within  the  data  base. 

The  data  base  model  is  the  meta-structure  which  is 
imposed  on  the  organization  of  the  data  base.  The  model 
prescribes  the  tyoes  of  entities  which  are  allowed.  It 
defines  the  data  attributes  and  structural  attributes  that 
an  entity  may  have.  The  definition  of  a  DDL  and  DML  is  the 
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implementation  of  the  meta-structures  of  a  data  base  model. 
Currently  the  two  most  widely  discussed  models  are  the  net¬ 
work  model  and  the  relational  model. 

4.  Goals  of  a  DBMS. 

The  following  goals  have  been  proposed  for  a  DBMS 
[Ref.  3]  . 

-  Allow  the  data  structures  suited  to  each  particu¬ 
lar  application  while  permitting  multiole  applications  to 
use  the  data  without  need  for  data  redundancy. 

-  Allow  more  than  one  process  to  concurrently 
retrieve  or  update  data  in  the  data  base. 

-  Enable  the  use  of  a  variety  of  search  strategies 
against  an  entire  data  base  or  a  portion  of  it. 

-  Provide  protection  of  data  from  unauthorized 

access . 

-  Provide  centralized  control  over  the  placement  of 

data. 

-  Provide  device  independence  for  oroarams. 

-  Allow  the  user  to  interact  with  the  data  but  be 
free  of  the  mechanics  of  maintaining  the  structural  associa¬ 
tions  which  have  been  declared. 

-  Allow  as  areat  an  independence  of  programs  from 
data  and  structures  as  possible. 

-  Make  the  data  description  independent  of  any  par¬ 
ticular  programming  language  but  oive  it  the  capability  of 
interfacing  with  a  variety  of  proaramming  languages. 
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These  goals  seem  to  be  Generally  agreed  on  in  the 


literature  as  being  reasonably  complete#  There  is  consider¬ 
able  controversvf  however#  over  the  relative  importance  of 
individual  ooals.  In  particular#  some  contend  that  the  pri¬ 
mary  goal  should  be  to  allow  the  user  to  be  free  of  the  data 
base  structures  entirely  tRef.  9], 

B#  The  Network  Model. 

One  of  the  two  data  base  models  which  has  received  wide 
attention  is  the  network  model.  This  model  is  grounded  in 
graph  theory  and  relationships  between  data  are  represented 
by  some  form  of  directed  graoh*  The  nodes  of  the  graph  ma^y 
be  entities  containing  data  attributes  or  may  simply  be 
place  holders  whose  only  attributes  are  the  arcs  of  the 
graph.  The  arcs  reoresent  loaical  links  between  the  enti¬ 
ties  which  can  be  travelled  in  the  direction  of  the  arc  to 
navigate  through  the  data  base.  Thus#  even  though  the 
implementation  of  the  arcs  may  be  transparent  to  the  user# 
the  access  oaths  are  visible  to  the  user  as  part  of  the 
structure  of  the  data  base.  The  DML  is  said  to  be  prescrip¬ 
tive  of  the  data  access  oaths#  that  is#  it  must  prescribe 
the  course  through  the  data  structures. 

Various  restrictions  as  to  the  type  of  network  allowed 
may  be  imposed  on  a  network  model.  For  example#  the  graphs 
may  be  reouired  to  be  acyclic  or  the  structures  may  be  res¬ 
tricted  to  trees#  chains  or  lists.  A  non-homogeneous  model 
has  the  restriction  that  if  two  nodes  are  connected  by  an 


arc  then  the  entities  reoresented  bv  the  nodes  may  not  be  of 
the  same  tyoe.  A  soecial  class  of  the  non-homoqeneous  net¬ 
work  mode)  is  the  hierarchical  model*  Hierarchicaf  models 
have  the  followinq  restrictions!  the  graphs  must  be  trees# 
an  entity  of  any  type  may  appear  only  once  on  a  particular 
branch  of  the  tree  and  certain  entity  types  must  always 
appear  on  a  given  branch  at  a  higher  level  than  other  entity 
tvpes.  An  example  of  a  hierarchical  data  base  would  be  one 
with  the  entities  country#  state#  county  and  city*  For  a 
particular  country#  one  or  more  of  the  entities  of  state  and 
county  may  be  left  out  between  a  country  and  its  cities#  but 
a  city  cannot  appear  above  a  state#  nor  can  it  appear  above 
another  citv*  An  example  of  a  non-h i e r a rc h i c a  1  data  base  is 
the  BOMP  in  which  a  subassembly  may  contain  other  subassem¬ 
blies  which  may  in  turn  contain  subassemblies.  Note  that 
the  BOMP  is  homogeneous  since  subassemblies  are  linked  to 
subassemb lies. 

C*  The  Relational  Model* 

The  second  of  the  two  most  discussed  modes  for  data 
representation  is  the  relational  model*  This  model  is 
grounded  in  set  theory  and  specifically  in  the  concept  of  a 
relation  in  the  mathematical  sense.  Given  sets  SI#  S2#  ***# 

Sn  (not  necessarily  distinct)#  R  is  a  relation  on  these  sets 
if  it  is  a  sub-set  of  the  Cartesian  product  of  SI  x  S 2  x 
X  Sn.  The  element  of  R  are  n-tuples  whose  jth  component  is 
from  Sj#  for  j  from  one  to  n*  R  is  said  to  be  an  n-ary 
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relation  or  of  riearee  n  and  Sj  is  called  the  jth  domain  of 

R* 


In  the  relational  models  each  set  Sj  must  be  a  set  of 
like^tyoe  attributes.  The  relations  are  named  and  time 
variant  according  to  some  maintenance  algorithm.  An  examole 
of  a  relation  would  be  a  set  called  time-spectrum  made  up  of 
freauency/  time  and  amplitude  triples.  This  ternary  rela¬ 
tion  might  represent  the  latest  30  minutes  of  data  from  a 
hydrophone.  Every  relation  must  have  a  key  by  which  the 
tuples  can  be  identified.  The  key  must  be  unique  (no  two 
tuples  with  the  same  key)  and  non-redundan t  (the  whole  key 
is  needed  for  identification).  In  the  above  example/  fre¬ 
quency  and  time  make  up  the  key. 

The  chief  advantage  of  the  relational  model  is  that  the 
user’s  view  of  the  data  is  independent  not  only  of  the  phy¬ 
sical  mapping  to  media/  but  also  of  the  access  paths 
involved.  The  chief  disadvantaae  is  the  difficulty  of 
devising  an  implementation  which  is  reasonably  efficient  for 
all  applications  [Ref.  8]. 

A  good  deal  of  work  has  been  done  on  normalizing  rela¬ 
tions  to  remove  undesirable  data  representational  charac¬ 
teristics  and  providing  appropriate  operations  and  transfor¬ 
mations  for  relations.  References  1^/  15  and  16  give  a 
definitive  exposition  on  the  theory  of  relational  data 
mode  1 s . 
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D 


The  COOASYL  DBTG  OBMS 


1.  History. 

COOASYL  is  an  informal  and  voluntary  organization 
of  interested  individuals*  suooorted  by  their  institutions* 
who  contribute  their  efforts  and  exoenses  towards  the  ends 
of  designing  and  develooina  techniques  and  languages  to 
assist  in  data  systems  analvsis*  desian  and  imolementation. 
Founded  in  its  most  famous  achievement  has  been  the 
definition  of  the  COmmon  Business  Oriented  Language  (COBOL). 
In  June*  1965*  the  COOASYL  COBOL  Language  Subcommittee  of 
the  Programming  Lanouages  Committee  (PLC)  resolved  to  organ¬ 
ize  a  task  force  to  study  list  orocessing.  In  November* 
1965*  this  task  force  produced  a  prooosed  list  orocessing 
extension  to  COBOL  for  file  management.  In  May*  1967*  the 
List  Processing  Task  Force  changed  its  name  to  the  Oata  Base 
Task  Group  (OBTG)  and  undertook  a  comparative  study  of  data 
base  manaoement  technigues  and  systems.  This  study  was  cul¬ 
minated  by  the  Dublication  of  an  interim  report  in  February* 
1968  and  the  agreement  bv  the  Lanouage  Subcommittee  that 
"COBOL  needs  the  Oata  Base  Concept"  (Ref.  21.  At  the  Tenth 
Anniversary  Meeting  of  COOASYL  held  in  May*  1969*  considera¬ 
tion  was  given  to  separating  the  data  description  and  data 
manipulation  languages.  The  idea  received  wide  endorsement 
at  the  meetina  and  was  the  basis  for  the  direction  of 
efforts  by  rhe  OBTG  until  October*  1969  at  which  time  Ref. 
17  was  oresented.  From  the  time  of  publication  of  Ref.  17 
until  the  publication  of  Ref.  3  in  Aoril*  1971*  179 
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proDosals  for  changes  and  extensions  to  the  DDL  and  DML  were 
considered^  of  which  130  were  incoroorated  into  Ref,  3.  In 
June»  1971»  it  was  decided  that  the  schema  DDL  should  be 
developed  seoaratelv  from  the  COBOL  DDL  and  DML.  Accord” 
inglvf  the  Data  Descriotion  Languaoe  Committee  (DDLC)  was 
formed  as  a  separate  organization  from  the  PLC.  The  DDLC 
preceded  with  modifications  and  enhancements  to  the  DBMS  and 
schema  DDL  definitions  and/  in  June/  1973/  produced  Ref,  2, 
This  document  is  currently  the  basis  for  the  CODASYL  DBMS, 

2,  Terminology  and  Concepts, 

For  a  complete  description  of  the  CODASYL  schema  DDL 
statements  and  DBMS  design  see  Ref,  2,  The  schema  DDL  is 
used  to  describe  a  data  base  and  has  the  following  entity 
types:  Data  items/  data  aggregates/  records/  areas  and 
sets , 


A  data  item  is  an  occurence  of  a  named  atomic  data 
attribute.  It  is  the  smallest  unit  of  named  data.  The  set 
of  values  that  a  data  item  can  assume  is  called  its  ranoe. 
The  range  of  an  item  is  always  restricted  to  values  of  a 
particular  tyoe.  The  possible  types  are  arithmetic  data/ 
string  data/  data  base  '<eys  and  implementor  defined  types, 

A  data  aggregate  is  an  occurence  of  a  named  collec¬ 
tion  of  data  items.  There  are  two  kinds:  vectors  and 
repeating  groups,  A  vector  is  a  one  dimensional  seouence  of 
data  items/  all  with  identical  characteristics.  A  repeating 
group  is  a  collection  of  data  attributes  that  occurs 
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multiple  times  within  a  record  occurence.  The  collection  of 
attributes  may  include  data  items  and  data  agaregates. 

A  record  is  an  occurence  of  a  named  collection  of 
zero  or  more  data  items  or  data  aagregates.  Each  record 
entry  defines  a  record  tyoe  of  which  there  may  be  zero  or 
more  occurences  within  the  data  base.  The  record  is  the 
smallest  addressable  entity  within  the  data  base. 

A  set  is  a  named  collection  of  records.  Each  set 
entry  in  the  schema  defines  a  set  type  for  which  zero  or 
more  occurences  (sets)  may  exist  in  the  data  base.  Each  set 
type  declared  in  the  schema  must  have  one  record  type 
declared  as  its  owner  and  may  have  one  or  more  record  types 
declared  as  its  members.  Each  set  occurence  which  exists  in 
the  data  base  must  contain  exactly  one  record  of  its  owner 
type  and  zero  or  more  of  its  member  record  types.  A  special 
set  type  may  be  delared  which  has  one  and  only  one  occurence 
and  whose  owner  is  the  DBMS.  A  set  so  declared  is  said  to 
be  a  singular  set.  There  is  no  provision  for  a  record  type 
to  be  both  an  owner  and  member  record  type  of  the  same  set. 
This  means  the  COOASYL  model  is  non-homogeneous.  It  is  not# 
however#  hierarchical  in  that  set  types  may  be  defined  with 
ownership  and  membership  such  that  cycles  can  occur. 

An  area  is  a  named  collection  of  records  which  need 
not  preserve  owner/member  relations.  An  area  mav  contain 
occurences  of  multiple  record  types  and  a  record  type  may 
occur  in  multiple  areas.  A  particular  recorg  occurence  of  a 
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record  is  assigned  to  an  area  when  it  when  it  is  created  and 
it  may  not  migrate  out  of  that  area.  An  area  may  be 
declared  to  be  temporary.  Temoorary  areas  are  created  esoe- 
cially  for  a  run-unit»  exist  for  the  life  of  the  run-unit 
and  are  destroyed  when  the  process  terminates.  Two  run- 
units  may  have  a  oarticular  temporary  area  open  concurrently 
but  each  run-unit  is  using  a  different  version  of  the  area 
which  is  unique  to  that  particular  run -unit.  The  concept  of 
area  allows  the  subdivision  of  the  data  base.  It  allows  the 
DBMS  to  control  placement  of  an  entire  area  to  provide  effi¬ 
cient  storage  and  retrieval.  Areas  are  a  convenient  unit 
for  recovery  and  also  provide  a  convient/  natural  subdivi¬ 
sion  for  allowing  a  part  of  the  data  base  to  be  removed  to 
off-line  storage. 

A  schema  consists  of  DDL  entries  and  is  a  complete 

\ 

description  of  a  data  base.  It  includes  the  names  and 
descriptions  of  all  areas^  set  tyoes  and  record  tyoes  that 
may  aopear  in  the  data  base.  A  data  base  is  the  totality  of 
all  recordSf  sets  and  areas  controlled  by  a  schema.  For  an 
installation  to  have  multiole  data  basesf  it  must  have  mul¬ 
tiple  schemas  and  the  content  of  the  data  bases  must  be  dis¬ 
joint. 

No  schema  DDL  entry  may  include  references  to  the 
physical  devices  or  media  soace.  Thus  a  schema  written  in 
the  DDL  is  indeoendent  of  the  physical  storage  of  data  and 
the  data  may  be  stored  on  any  combination  of  storage  media 


available  to  a  DB^S 


Some  devices/  due  to  their  sequential 


nature^  may  not  allow  the  full  advantages  of  DDL  facilities/ 
however  the  use  of  these  devices  is  not  precluded. 

A  program  is  a  set  or  aroup  of  instructions.  User 
programs  must  have  access  to  a  sub*schema  DDL  description  of 
that  portion  of  the  data  base  they  are  interested  in.  Addi¬ 
tionally/  they  must  be  able  to  use  a  DML  to  interact  with 
the  data  base  through  the  DBMS. 

A  run-unit  is  the  execution  of  one  or  more  programs 
viewed  by  the  operating  system  as  a  unit.  Under  03/360/  the 
run-unit  might  be  a  job  and  under  UNIX/  a  parent  process  and 
any  children.  The  run-unit  makes  requests  of  the  DBMS  which 
in  turn  consults  the  schema  and  interacts  with  the  operating 
system  to  fulfill  the  request. 

A  user  working  area  (UWA)  is  conceptually  a  loading 
and  unloading  zone  where  all  data  provided  to  a  run-unit  by 
the  DBMS  and  all  data  to  be  picked  up  bv  the  DBMS  must  by 
placed.  The  DBMS  has  its  own  system  buffers  which  it  uses 
to  manipulate  the  data  base.  It  uses  the  UWA  only  for  input 
and  output  of  data  for  the  requesting  run-unit.  Each  run- 
unit  has  its  own  UWA. 

3.  The  Schema  vs.  the  Sub-schema. 

The  subschema  has  the  following  characteristics.  An 
arbitrary  number  of  oossibly  overlapoing  sub-schemas  may  be 
declared.  Multiple  programs  may  reference  a  sub-schema  but 
they  have  access  only  to  that  portion  of  the  data  base 
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included  in  the  sub-schema.  ThuSf  the  sub-schema  DDL 
description  enables  the  subsetting  of  the  data  base  so  that 
a  user  program  need  only  worry  about  that  portion  of  the 
data  base  it  usesf  and  insulates  the  remainder  of  the  data 
base  from  the  user.  A  measure  of  the  data  independence  is 
provided  between  the  schema  and  sub-schema.  The  sub-schema 
description  may  differ  from  the  schema  in  the  following 
way  s . 


a.  Data  Item  Level. 


Descriptions  of  items  may  be  omitted.  Included 
items  may  be  of  a  different  type  or  in  a  different  position 
within  the  record. 

b.  Data  Agqregate  Level. 

Descriptions  of  specific  data  aggregates  may  be 
omitted.  Data  aggregates  and  items  may  have  additional 
structure  imposed  on  them  (e.a.  vectors  may  become  multi¬ 
dimensional  arrays).  The  position  of  data  aggregates  within 
a  record  may  be  changed. 


c .  Record  Level. 

Descriptions  of  records  may  be  omitted. 
Descriptions  of  new  record  types  composed  of  data  from  other 
record  types  may  be  introduced  (not  supoorted  by  the  COBOL 
or  C  DDL's). 
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d .  Set  Level 


Descriptions  of  specific  set  tyoes  may  be  omit¬ 
ted.  Different  set  selection  criteria  may  be  soecified. 
Descriptions  of  specific  membe r - rec o rd  tyoes  may  be  omitted. 

e.  Area  Level. 

Descriptions  of  specific  areas  and  the  records 
within  them  may  be  omitted/  while  occurences  of  the  same 
record  type  in  other  areas  are  included. 

y.  The  Schema  and  the  DML . 

The  relationship  between  the  DDL  and  the  DML  is  that 
between  declarations  and  procedures.  In  order  to  specify 
this  relationship/  a  set  of  basic  data  martioulation  func¬ 
tions  must  be  defined  which  is  DML  and  host  lanquage 
independent.  Specific  commands  provided  by  a  particular  DML 
must  be  resolved  into  these  basic  functions.  Basic  func¬ 
tions  include  the  capability  of  selecting  records/  present¬ 
ing  them  to  a  run-unit  and  addino/  changing  or  removing 
records  and  relationships. 

5.  Data  Base  Administration. 

Certain  facilities  must  be  availible  to  support  the 
user  programs.  These  tools  are  not  defined  in  the  CODASYL 
DBMS  and  may  include  the  following. 
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a.  Recovery  Routines. 

Data  base  recovery  routines  may  be  used  includ* 
i  ng  activity  loaqing^  checkpoint  and  rollback, 

b.  Utility  Routines. 

Utility  and  service  routines  are  required  to 
support  a  data  base  in  day-to-day  operations.  Examples 
include  routines  for  editinq  and  printing^  loading  and  dump¬ 
ing#  preconditioning#  garbage  collection#  statistical 
analysis  and  comparison. 

c.  Schema  (^et  a- 1  anguaqe  . 

This  language  permits  changes  in  the  schema  and 
cause  them  to  be  reflected  in  the  data  base.  I’Jithout  such  a 
language#  the  changes  must  be  made  by  defining  a  new  schema 
and  recreating  the  data  base  accordingly. 

d.  Device  Media  Control  Language  (OMCL), 

This  1 anguaae  provides  for  assignment  of  data  to 
devices  and  media  space#  and  specification  and  control  of 
buffering#  paging  and  overflow. 

6.  Data  Base  Procedures. 

At  various  points  in  the  accessing  of  a  data  base# 
non-standard  computations  or  processing  may  be  required.  To 
allow  for  these  situations#  the  capability  is  provided  to 
define  data  base  procedures.  These  procedures  may  be 
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invoked  for  checkina  of  privacy  lockSf  oroducino  comcuted 
results  from  other  items/  searching  alaorithms/  data 
compression  and  expansion/  validity  checking  or  system 
i ns  t  rument  at i on  , 

7,  Record  Placement  Control. 

The  schema  DDL  permits  specification  of  an  area  or 
areas  to  which  record  occurences  of  a  particular  type  must 
be  assigned.  The  schema  DDL  also  includes  a  clause  which 
causes  records  being  added  to  be  placed  near  some  other 
record.  Conceptually/  the  effect  of  such  clauses  is  to 
cause  clustering  of  records  which  are  likely  to  be  used  in 
conjunction  with  one  another.  These  declarations  for 
selecting  the  area  and  location  within  the  area  are  the 
WITHIN  clause  and  the  LOCATION  clause/  respectively/  of  the 
record  subentry.  The  fact  that  the  schema  DDL  permits 
placement  control  is  not  assumed  by  CODASYL  to  have  any  phy¬ 
sical  connotations. 

8.  Data  Base  Keys. 

The  DDL  assumes  that  every  record  occurence  in  the 
data  base  has  a  unigue  identifier  which  enables  the  DBMS  to 
distinguish  it  from  every  other  record  in  the  data  base. 
This  key  must  be  assigned  when  the  record  is  created  and 
remains  with  it  for  the  life  of  the  record.  This  key  may  be 
supplied  to  the  DBMS  by  a  run-unit  or  data  base  procedure/ 
generated  from  the  record's  contents  or  assianed  by  the 
DBMS.  The  Permanence  of  the  key  must  be  insured  since  any 
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run-unit  may  use  the  keys  to  refer  to  the  record. 

9,  Order i no  of  Sets. 

Each  set  tyoe  declared  in  the  schema  must  have  an 
ordering  soecified  for  it.  This  order  is  maintained  by  the 
DBMS  and  is  a  logical^  not  a  ohysicaW  ordering,  ThuSf  the 
same  record  occurences  could  oarticipate  as  members  in 
several  sets  of  different  tvoes  and  be  ordered  differently 
in  each  of  the  sets.  The  member  records  of  each  occurence 
of  a  given  set  tyoe  can  be  ordered  in  any  of  the  following 
way  s , 

-  Sorted  in  ascending  or  descending  order  based  on  the 
value  of  specified  keys.  These  keys  may  be  data  items  in 
the  member  records/  the  names  of  the  member  records/  the 
data  base  keys  of  the  member  records  or  some  combination  of 
the  above, 

-  Sorted  in  the  order  resulting  from  inserting  new 
member  records  first  in  the  set/  last  in  the  set  or  before 
or  after  the  set  member  which  is  currently  known  to  the 
reguesting  run-unit. 

-  Sorted  in  the  order  most  convenient  to  the  DB'^S, 

10.  Search  Keys. 

An  arbitrary  number  of  search  keys  may  be  declared 
for  a  set  tyoe  regardless  of  whether  it  is  sorted  or  not. 
The  components  of  the  search  keys  must  be  data  items 
included  in  the  member  records  of  the  set.  The  declaration 
of  a  search  key  causes  the  OB^s  to  develoo  and  use  some  kind 
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indexing  for  the  member  records  of  each  occurence  of  the 
set  tyoe.  The  term  indexina  is  used  here  to  refer  to  any 
technique  which  does  not  involve  a  comolete  scan  of  the 
records  involved. 

11.  Set  Membershio. 

A  record  type  may  have  different  kinds  of  set 
membership  declared  for  different  set  types.  Automatic 
membership  means  that  membership  is  established  in  an 
approoriate  occurence  of  a  set  tyoe  when  a  record  is  added 
to  the  data  base.  Manual  membershio  means  that  membership 
can  only  be  established  in  a  set  occurence  by  a  run-unit 
executing  an  insert  function. 

Mandatory  or  optional  membership  concerns  the  remo¬ 
val  of  a  record  from  a  set  occurence.  Once  a  record  has 
been  established  as  a  member  of  a  set  for  which  it  has  man¬ 
datory  membershio/  it  cannot  be  removed  until  the  record  is 
deleted.  If  the  membershio  is  optional/  the  record  may 
cease  to  have  membershio  via  a  remove  function. 

A  set  tyoe  may  be  declared  as  dynamic.  A  dynamic 
set  may  have  a  record  of  any  type  inserted  into  it  or 
removed  from  it.  If  a  set  type  is  declared  to  be  dynamic/ 
no  member  records  may  be  declared  for  it. 

12.  Set  Selection. 

In  general/  there  will  be  more  than  one  set  of  a 
given  type  in  the  data  base.  It  is  therefore  necessary  to 
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provide  a  means  for  identifvinq  the  proper  set  when  member 


records  are  stored  and  retrieved.  The  SELECTION  clause  of 
the  member  subentry  in  the  DDL  controls  the  strategy  for 
selecting  a  specific  set  of  a  given  type.  A  separate  SELEC¬ 
TION  clause  is  required  for  each  member  record  type  and  set 
type  pair.  The  SELECTION  clause  provides  for  naminq  a 
series  of  sets  which  form  a  continous  path  to  the  desired 
set.  For  all  the  sets  along  the  path,  other  than  the  first 
named  set,  the  DB^S  limits  its  search  to  the  member  records 
of  the  set  selected  at  the  previous  steo  in  the  oath. 

13.  Privacy  of  Data. 

Protection  aoainst  unauthorized  data  access  is  pro¬ 
vided  through  a  mechanism  of  orivacy  locks  which  are  speci¬ 
fied  in  the  schema.  Privacy  keys  must  be  orovided  by  a 
run-unit  seeking  to  access  or  alter  data  protected  by  a 
privacy  lock.  The  schema  DDL  provides  for  declaring  privacy 
locks  at  the  schema,  area,  record,  data  item,  data  aggre¬ 
gate,  set  and  member  levels.  Locks  can  be  declared  for 
specific  functions  at  each  of  these  levels.  A  privacy  lock 
is  either  a  value  which  must  be  matcheo  by  a  co r resoond i nq 
orivacy  key  or  a  data  base  procedure  which  is  called  to 
validate  the  privacy  key.  If  a  procedure  is  used,  it 
returns  a  yes  or  no  answer,  and  beyond  this  the  action  of 
such  a  procedure  is  implementor  defined. 
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1^.  Intea pity  of  Data. 

The  DDL  orovides  for  the  checking  of  the  validity 
of  a  data  item  whenever  a  value  is  changed  or  a  new  value  is 
stored  in  the  data  base.  In  addition,  provision  is  made  for 
the  naming  of  data  base  procedures  which  the  DBMS  invokes 
when  a  run-unit  attempts  to  update  nominated  records  or 
sets.  This  feature  enables  a  check  of  any  update  or  series 
of  updates  applied  to  the  data  base. 

E.  Laboratory  Eguioment  and  Software. 

The  computer  eguioment  in  the  laboratory  consists  of  two 
POP  11/50’s  with  associated  peripherals.  The  information 
about  the  eauipment  which  is  relevant  to  this  thesis  is 
minimal,  however  it  should  be  noted  that  the  DBMS  was 
developed  using  an  interactive  display  terminal  and  is 
oriented  toward  that  environment. 

The  operating  system  which  supports  the  0B^*S  is  UNIX. 
Reference  1  contains  a  supplement  to  the  following  discus¬ 
sion  of  UNIX. 

1 .  The  File  System. 

The  most  important  function  of  UNIX  is  to  provide  a 
file  system.  From  the  user’s  point  of  view  their  are  three 
kinds  of  files:  ordinary  files,  directories  and  special 
f  i  1  e  s  . 
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An  ordinary  file  can  contain  any  information  the 
user  desires.  The  system  imposes  few  special  structure 
requirements  on  fileSf  however  some  programs  expect  files  of 
a  certain  format.  A  text  file  consists  of  a  string  of  char¬ 
acters  with  lines  delimited  by  new  line  characters.  A 
binary  program  file  is  a  sequence  of  words  as  they  will 
aopear  in  main  memory  when  the  proaram  is  executed.  The 
assembler  and  loader  programs  use  soecial  object  file  for¬ 
mats. 


Directories  provide  the  maDoing  between  the  names  of 
files  and  the  files  themselves.  They  induce  a  structure  on 
the  file  system  as  a  whole.  A  directory  behaves  exactly 
like  an  ordinary  file  exceot  that  the  system  controls  its 
format  and  contents.  Each  system  user  has  a  directorv  asso¬ 
ciated  with  his  user  name  and  he  may  create  sub -d i rec t o r i e s 
to  organize  collections  of  his  files.  The  system  has 
several  directories  which  it  maintains  for  its  own  use.  One 
is  the  root  directory.  The  directories  in  a  file  system 
form  a  tree  and  the  root  is  the  base  of  this  tree.  Thusr 
any  file  in  the  system  can  be  located  bv  tracing  a  oath  from 
the  root  throuah  the  aoprooriate  directories.  Another  sys¬ 
tem  directory  contains  all  the  orograms  which  are  used  as 
system  commands  and  is  special  only  in  that  certain  proarams 
"know”  its  name. 

Each  directory  must  aopear  as  an  entry  in  exactly 
one  other  directory  called  its  parent.  Each  directory  has 
two  special  entries.  These  are  the  name  which  refers  to 
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the  directory  itself  and  the  name  ” . . "  which  refers  to  the 
parent  directory.  These  entries  enable  reference  to  the 
directory  and  its  oarent  without  knowino  the  name  expli¬ 
citly. 


File  names  are  strings  of  or  fewer  characters. 
Identification  of  a  file  to  the  system  is  accomplished 
through  a  string  of  directory  names  separated  by  virqules 
("/")  and  terminated  by  the  file  name  desired.  This  string 
is  called  a  path  name.  t'Vhen  the  path  name  is  st’arted  with  a 
virgulef  the  system  begins  the  path  search  at  the  root 
directoryf  otherwise  it  starts  at  the  user's  current  working 
directory.  For  example/  the  path  name 
" / f o X t ro t /un i f o rm/c ha r 1 i e "  would  cause  the  system  to  start 
at  the  root/  search  for  directory  "foxtrot”/  search  "fox¬ 
trot"  for  directory  "uniform"  and  find  Hie  "charlie"  in 
"uniform".  The  file  "charlie"  could  be  any  type  file/ 
including  a  directory.  In  another  case/  the  pathname  "kilo" 
would  cause  the  system  to  search  the  user's  current  direc¬ 
tory  for  "kilo".  The  oath  name  refers  to  the  root 
i t  se 1 f  . 


Special  files  provide  the  means  of  handling  I/O  dev¬ 
ices.  Each  device  supported  by  UNIX/  including  communica¬ 
tions  lines  and  main  memory/  is  associated  with  one  or  more 
special  'files.  These  files  can  be  read  or  written  in  the 
same  manner  as  ordinary  files  except  that  the  result  is  the 
activation  of  ti^e  appropriate  device.  All  special  files 
reside  in  directory  "/dev". 
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The  access  control  or  protection  scheme  in  UNIX  is 
relatively  simple.  Each  user  known  to  the  system  has  a 
unique  user  number  called  the  user  id.  When  a  file  is 
created^  the  aporopriate  user  id  is  associated  with  it  and 
bits  are  set  in  the  directory  entry  indicating  which  users 
have  permission  to  read  t  write  or  execute  the  file.  A 
facility  is  provided  for  executable  files  called  set*user-id 
whereby  when  the  files  are  executed  the  resulting  process 
assumes  the  user  id  of  the  owner  of  the  executable  file. 
This  enable  a  system  program  executed  by  a  user  to  access 
files  which  the  user  cannot  directly  access  himself.  Since 
anyone  may  cause  his  executable  files  to  use  set-user-idf 
this  feature  is  generally  available  to  provide  protected 
access  to  files.  The  system  recoanizes  one  user  id  (the 
"super  user")  as  being  free  of  any  access  restrictions.  The 
major  flaw  in  the  UNIX  protection  scheme  is  that  there  is  no 
way  to  monitor  or  lock  out  simultaneous  opening  of  a  file  by 
multiple  proarams  with  access  rights  to  the  files.  The 
system's  authors  contend  that  these  features  are  neither 
necessary  nor  sufficient  for  integrity  controls  (Ref.  IJ. 
However,  the  reasoning  behind  declaring  the  features 
unnecessary  was  that  "we  are  not  faced  with  large  single¬ 
file  data  bases  maintained  by  independent  processes". 

2.  Input/Output  (I/O)  Calls. 

Under  UNIX,  I/O  calls  are  designed  to  eliminate  the 
difference  between  the  various  devices  and  forms  of  access. 
The  file  system  organizes  all  media  space  into  512  bvte 


35 


blocks  which  are  its  smallest  readable  and  writable  unit 


Con seauen t 1 y »  reads  and  writes  of  512  bytes  starting  on  a 
512  byte  boundary  are  most  efficient.  However^  no  loaical 
record  sire  is  imoosed  by  the  system/  nor  is  there  any  dis¬ 
tinction  between  random  or  sequential  access.  To  read  or 
write  an  already  existing  file/  an  "ooen"  call  must  be  made. 
This  system  call  is  oassed  a  path  name  and  returns  a  number/ 
called  a  file  descriptor/  which  identifes  the  open  file  to 
the  system.  The  file  descriptor  is  used  in  subsequent  I/O 
calls.  In  order  to  create  and  open  a  file/  a  "creat"  call 
must  be  made.  This  call  requires  parameters  which  specify 
the  file  name  and  access  mode/  and  returns  a  file  descrip¬ 
tor.  A  "creat"  on  an  existing  file  truncates  it  to  zero 
lenoth.  An  ooen  file  may  be  accessed  via  "read"  and  "write" 
calls.  These  system  calls  require  the  file  descriptor/  the 
location  of  a  read/write  buffer  and  the  length  of  the 
buffer. 


To  enable  random  access  of  appropriate  files/  the 
"seek"  call  is  provided.  This  svstem  call  merely  changes 
the  read/write  pointer  associated  with  an  open  file.  The 
read/write  pointer  contains  the  byte  offset  from  the  begin¬ 
ning  of  the  file  at  which  the  next  access  will  beain.  Other 
system  calls  exist  for  such  file  manipulations  as  closing  a 
file/  finding  the  status  of  a  file/  changing  the  protection 
mode  or  owner  of  a  file/  creating  or  removing  a  directory/ 
making  a  link  to  an  existing  file  and  deleting  a  file. 
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3.  Processes  and  Imaqes. 

An  image  is  an  entire  comouter  execution  environment 
including  main  memory  image#  general  register  values#  the 
status  of  ooen  files  and  the  identity  of  the  current  direc” 
tory.  Thus  the  image  constitutes  a  state  vector  of  a  pro- 
cess  which  contains  all  information  necessary  to  resume  exe“ 
cution  of  the  process.  A  process  is  the  execution  of  an 
image  while  the  virtual  machine  is  imoosed  on  the  hardware 
by  the  system.  The  virtual  address  space  of  a  orocess  is 
divided  into  three  logical  segments:  the  grog ram  text 
(instructions  and  constants)#  data  and  stack.  Pure  text  is 
read  only  for  the  user  while  the  data  and  stack  segments  may 
expand  or  contract  in  size. 

A  process  comes  into  existence  through  a  "fork"  call 
executed  by  another  orocess.  This  system  call  creates  an 
exact  duplicate  of  the  image  of  the  calling  process.  The 
only  difference  between  the  processes  is.  that  one  process  is 
considered  the  parent  and  the  other  the  child.  Both  execute 
as  if  returning  from  the  "fork"  call.  The  parent  receives 
as  a  return  value  a  number  called  the  process  id#  which 
uniauelv  identifies  the  child. 

The  child  receives  zero  as  its  return  value.  Synchroniza¬ 
tion  between  parent  and  child  is  provided  by  the  "wait" 
call.  When  a  process  with  children  executes  a  "wait"#  its 
execution  is  suspended  until  one  of  its  children  terminates. 
The  return  value  of  the  "wait"  is  the  process  id  of  the  ter¬ 
minated  child.  Interprocess  communication  is  provided  by 
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the  **0  106*^  cal).  This  system  call  sets  ud  a  channel  which 
can  be  read  or  written  by  any  process  which  has  as  an  ances¬ 
tor  the  process  that  executed  the  "pipe**  call« 

The  ’^exec"  system  call  is  provided  to  allow  the  exe¬ 
cution  of  a  program  (i,e.f  executable  file).  The  "exec” 
call  needs  a  oath  name  to  the  file  as  its  argument.  A  pro- 
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"exec"  occurs  only  if  the  function  is  unsuccessful.  Termi¬ 
nation  of  a  process  can  be  accomplished  via  an  "exit"  system 
call.  When  an  "exit"  is  executed^  the  process  and  associ¬ 
ated  image  cease  to  exist. 

The  C  Lanauaqe. 

C  is  the  programming  1 anquaae  primarily  used  under 
UNIX,  Most  of  UNIX  itself  is  coded  in  C,  C  provides  modern 
control  structures  to  allow  structured  GOTO-less  coding. 
Its  design  objectives  were  to  give  shorter  and  clearer  coder 
encourage  modularity  and  good  orooram  organization  and  oro- 
vide  facilities  for  many  different  types  of  data  including 
pointers  and  character  strings. 

A  C  program  consists  of  a  group  of  functions  (one  of 
which  must  be  named  "main")  and  oossibly  some  external  data 
declarations.  Parameters  may  be  passed  between  functions 
via  call  and  return  arguments  or  through  external  data 


^8 


items.  C  is  not  a  block  structured  1 anauage  in  that  func¬ 
tions  cannot  be  defined  locally  to  other  functions  and 
external  data  names  may  not  be  redeclared  locally  to  a  func¬ 
tion.  However/  the  block  structured  language  feature  of 
allowing  a  grouo  of  statements  to  be  considered  as  a  single 
statement  is  included.  This  qrouDinq  is  accomplished  by 
enclosing  the  statements  within  "f"  and 

The  basic  data  types  in  C  are  "int"/  "char"/ 
"float"/  "double"  and  "struct".  In  addition/  arrays  of  or 
pointers  to  any  of  these  types  can  be  declared.  Items  of 
type  "int"  are  16-bit  two's  complement  integers.  Items  of 
type  "char"  are  0-bit  values  which  can  be  interpreted  as 
characters  or  as  two's  complement  integers.  Strings  are 
represented  as  arrays  of  characters.  Items  of  type  "float" 
or  "double"  are  binary  floating  point  numbers  of  length  32 
and  6^1  bits  reoectively.  An  item  of  type  "struct"  consists 
of  a  group  of  item  declarations  (possibly  including  arrays) 
which  can  be  viewed  as  a  unit.  This  latter  capability  pro¬ 
vides  for  user  definition  of  a  theoretically  infinite  number 
of  data  types. 

C  provides  a  large  number  of  binary  and  unary  arith¬ 
metic  and  logical  ooerators.  Arithmetic  ooerations  provided 
are  addition  and  subtraction/  mu  1 t i o 1 i c a t i on  and  division/ 
incrementation  and  Oec r emen t a t i on /  and  bit-wise  OR/  AND  and 
comolement.  Logical  operators  allow  expressions  to  be  com¬ 
pared/  logically  AND'ed/  logically  OR'ed  and  logically  com¬ 
plemented.  Mo  distinction  exists  between  a  logical 
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expression  end  an  arithmetic  expression.  Any  expression  has 
a  true  value  if  and  only  if  it  evaluates  to  a  non-zero 
value. 


Assignment  statements  are  orovided  in  C  which  are 
unusual  in  the  following  ways.  An  assignment  statement  can 
be  used  as  an  expression  and  has  the  value  that  was  assigned 
to  the  variable  on  the  left  hand  side  of  the  assignment 
statement.  A  number  of  assianment  operators  exist  which 
cause  a  binary  ooeration  to  take  place  between  the  left  hand 
side  and  the  evaluated  riaht  hand  side  orior  to  storage  of 
the  value  (e.g.  "x  =+  2?"  adds  two  to  "x"). 

The  major  control  statements  in  C  are  "while”/  "do- 
while"/  "for"/  "switch"/  "aoto"/  "break"  and  "continue". 
The  "while"  statements  causes  execution  of  a  aroup  of  state¬ 
ments  as  long  as  an  expression  is  true.  A  "do-while"  state¬ 
ment  is  like  a  "while"  exceot  that  the  control  expression  is 
evaluated  after  the  execution  of  the  group  of  statements. 
Therefore/  the  "do-while"  statement  is  always  executed  at 
least  once.  The  "for"  statement  is  an  extension  of  the 
while  which  provides  control  variable  initialization  and 
loop  incrementation.  The  "switch"  statement  allows  the  exe¬ 
cution  of  one  of  a  aroup  of  statements  labeled  as  cases 
based  on  the  value  of  an  exoression.  The  "goto"  statement 
transfers  control  to  a  labeled  statement  in  the  usual 
fashion.  "Break"  and  "continue"  exist  to  orovide  for 
label-free  loop  termination  and  skippina. 


A  subroutine  library  is  provided  for  use  with  C  pro 


grams.  It  contains  system  calls  for  I/O  and  other 
tipns.  In  addition,  it  contains  routines  for  formated 
put  and  for  the  standard  functions  of  analysis.  For 
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complete  description  of  C,  see  Ref.  10  and  18 


Ill 


implementation  of  the  cooasyl  design 


A.  ImDlementation  PhilosoDhy, 

The  overriding  cons i derat i on  in  implementing  the  DBMS 
was  to  avoid  any  modifications  or  additions  to  the  existing 
UNIX  facilities.  This  decision  was  made  for  a  number  of 
reasons.  Firsts  other  research  is  being  conducted  in  the 
Computer  Laboratory  utilizing  the  UNIX  operatino  system  as  a 
research  tool.  Running  systems  which  regui re  non-standard 
veresions  of  UNIX  interfers  with  the  control  environment  and 
generally  makes  other  operating  system  modifications  more 
difficult.  Seconds  a  modification  to  the  operating  system 
must  be  re“applied  whenever  a  new  release  of  UNIX  is 
installed.  Thirds  the  chances  of  the  DBMS  heina  transported 
to  other  UNIX  sites  is  far  greater  if  it  runs  under  a  stan¬ 
dard  UNIX.  Finally^  the  research  goal  of  determining  if  the 
DBMS  could  be  implemented  in  a  variety  of  environments  would 
be  subverted  by  modifying  the  operating  system  environment. 

Since  the  most  notable  feature  in  UNIX  is  the  design  of 
its  file  system^  it  was  decided  to  utilize  the  file  system 
whenever  possible  rather  than  acouiring  a  large  block  of 
physical  media  space  and  letting  the  DBMS  manage  it.  This 
philosophy  was  expected  to  simplify  the  problem  of  mapping 
data  to  media  and  thereby  reduce  the  size  and  complexity  of 
the  DBMS  and  insulate  the  DBMS  from  changes  in  the  hardware. 


The  final  guideline  was  to  implement  as  large  a  useful 
subset  of  the  features  in  the  COOASYL  design  as  feasible 
under  the  above  assumptions.  Creative  extensions  to  the 
CODASYL  design  were  avoided  since  these  would  tend  to 
obscure  the  research  goal  of  beina  able  to  measure  the  util¬ 
ity  of)  the  CODASYL  network  model  against  the  INGRESS  rela¬ 
tional  model.  Efforts  were  directed  instead  to  the  realiza¬ 
tion  of  the  goals  of  the  CODASYL  DDLCf  which  are  verv  ambi¬ 
tious  in  themselves. 

None  of  the  above  assumotions  should  be  taken  as  pre¬ 
cluding  the  possibility  of  future  modifications  to  enhance 
the  implementation  either  of  UNIX  or  of  the  features  of  the 
CODASYL  design.  The  intent  of  the  implementation  philosophy 
described  herein  was  to  produce  a  standard  CODASYL  DB^S  run¬ 
ning  under  a  standard  UNIX  for  use  as  a  baseline  product. 

B.  Organization  of  a  Data  Base. 

Virtually  all  information  about  the  data  base  described 
by  a  particular  schema  is  contained  in  a  special  directory. 
The  only  exceptions  are  certain  files  which  are  created  for 
the  life  of  a  user  process  and  then  discarded.  The  data 
base»  its  schema  and  its  directory  all  have  the  same  name. 
Although  it  is  possible  for  directories  with  the  same  name 
to  exist  in  a  UNIX  file  system,  no  two  data  bases  should 
haye  the  same  name.  The  files  within  the  directory  associ¬ 
ated  with  a  data  base  (called  a  schema  directory)  contain 
the  source  and  object  schemas,  the  schema  Data  Base  Manaaer 


(DBM)  program  and  al)  the  non-t emoo rary  data  within  the  data 


base.  Specific  files  will  be  mentioned  when  they  are 
relevant  to  the  discussion,  Apoendix  B  contains  a  complete 
listing  of  the  files  associated  with  a  schema, 

C,  Operating  Environment, 

The  environment  for  both  system  maintenance  and  user 
access  of  the  data  base  is  provided  by  the  DBM  Request  Pro¬ 
cessor  (”dbm”).  This  program  is  a  general  purpose  command 
language  processor  used  to  orovide  interface  with  any  data 
base.  Appendix  C  contains  a  description  of  the  functions  of 
the  DBM  Request  Processor, 

When  a  user  wishes  to  execute  a  program  which  accesses  a 
data  base#  he  executes  dbm  and  specifies  the  appropriate 
schema  name.  He  then  gives  dbm  an  ”x”  command  and  specifies 
a  path  name  to  the  user  program  and  arguments  to  be  passed 
to  the  program,  Dbm  opens  two  Pipes  as  interprocess  commun¬ 
ication  channels  and  forks  off  two  children.  Through  "exec” 
calls#  these  processes  become  the  schema  DBM  orogram  and  the 
requested  user  program  respectively.  Both  programs  are 
passed  the  file  descriptors  of  their  respective  ends  of  the 
interprocess  communication  oioes  as  part  of  their  calling 
arguments.  The  child  destined  to  become  the  schema  03M 
changes  directories  ("chdir”)  to  the  schema  directory  prior 
to  executing  the  schema  DBM, 


Once  the  user  and  the  schema  DBM  orogram  are  esta- 
blishedr  dbm  waits  until  they  have  terminated  before  accept¬ 
ing  any  more  commands*  During  execution^  the  user  program 
and  the  schema  DBM  pass  reauests  and  data  through  the  two 
pipes  with  the  user  proaram  executing  in  the  user’s  working 
directory  and  the  schema  DBM  executing  in  the  schema  direc¬ 
tory*  It  is  possible  for  other  users  to  execute  con¬ 
currently  using  the  same  data  base?  however^  each  user  has 
his  own  version  of  the  schema  DBM  and  a  separate  set  of 
interprocess  pipes. 

This  operating  environment  differs  from  the  one 
envisioned  by  the  CODASYL  DOLC  in  that  each  user  process  is 
interfaced  to  its  own  copy  of  the  DBMS  routines*  Each  copy 
has  its  own  buffers  and  no  knowledge  of  the  existence  of 
other  copies#  except  that  which  it  can  derive  from  the  state 
of  files  within  the  schema  directory*  In  contrast#  the 
CODASYL  designers  described  implementation  of  a  sinale  copy 
of  the  DB^^S  routines  which  would  concurrently  communicate 
with  all  the  users  and  have  communal  system  buffers  for  ser¬ 
vicing  all  user  reauests  tRef.  21.  The  reasons  for  this 

difference  are  twofold. 

/ 

First#  even  though  pines  are  the  only  reasonable  method 
for  interprocess  communication#  they  are  limited  in  that  two 
processes  may  communicate  only  via  a  pipe  ooened  by  a  common 
ancestor.  In  general#  the  only  common  ancestor  of  processes 
spawned  by  different  users  is  UNIX  and#  although  the  mechan¬ 
ism  exists  for  finaing  the  process  id  of  a  process  (**os”)# 


no  mechanism  exists  for  reauestina  UNIX  to  ooen  a  pipe  to  a 


designated  process. 

Seconds  even  if  a  mechanism  existed  to  connect  a  single 
copy  of  the  schema  DBM  to  usersf  that  single  copy  could  not 
muster  sufficient  resources  to  service  them.  In  particular/ 
a  process  may  have  only  fourteen  simultaneously  ooen  files 
and  each  user  would  reauire  two  files  ooen  (its  pipes)  on  a 
dedicated  basis.  Thus/  since  access  to  an  area  requires  two 
files  to  be  ooen/  a  single  schema  DBM  having  several  users 
each  requiring  several  areas  would  develop  a  thrashing  con¬ 
dition  in  which  almost  every  access  to  the  data  base  would 
incur  the  overhead  of  two  file  ooens  and  two  file  closes. 
Additionally/  a  problem  with  memory  buffer  contention  might 
arise/  although  this  oroblem  would  orobably  be  less  criti¬ 
cal. 


The  existence  of  separate  copies  of  the  schema  D8M  does 
not  mean  that  the  program  must  be  duplicated  in  memory  for 
each  of  its  current  users.  UNIX  provides  a  facility  for 
processes  executing  the  same  program  to  share  the  same  text 
segment/  thus  only  the  data  and  stack  segments  are  repli¬ 
cated  for  each  process.  The  consequences  of  the  multiple 
schema  DBM  environment  will  be  discussed  later. 


D,  Source  and  Object  Schemas 


i-^hen  a  new  data  base  is  to  be  created^  a  source  schema 
description  must  be  prepared.  A  schema  directory  should  be 
created  (usina  the  UNIX  function  ”mkdir”)  to  contain  the 
source  language  version  of  the  schema.  The  source  schema 
description  must  reside  in  a  file  whose  name  is  formed  by 
prefixing  the  schema  name  with  ”s.’’  and  which  is  located  in 
the  schema  directory.  The  UNIX  text  editor  (”ed”)  is  suit¬ 
able  for  entering  the  source  schema  description.  The  source 
schema  is  coded  in  a  modified  form  of  the  CODASYL  DDL 
described  in  Ref.  2.  Differences  between  the  DDL  of  Ref.  2 
and  the  UNIX  OB^S  DDL  are  discussed  in  Apoendix  G. 

Once  the  source  description  of  the  schema  is  entered#  it 
must  be  compiled  into  an  object  version.  This  compilation 
is  accomplished  via  the  '*c**  command  of  the  DBM  Reauest  Pro¬ 
cessor.  The  object  version  of  the  schema  consists  of  two 
files  which  contain  the  schema  DBM  program  and  the  encoded 
schema  description#  respectively.  The  schema  DBM  interprets 
and  services  all  user  reauests  for  access  to  the  data  base# 
while  the  encoded  schema  description  is  a  compact  symbolic 
form  of  the  schema’s  structure.  The  name  of  the  file  con¬ 
taining  the  schema  DBM  is  the  schema  name  prefixed  by 
”dbm.”.  The  schema  DBM  is  discussed  in  Section  III.L  below. 
The  encoded  schema  description  file  is  used  to  initialize 
the  schema  DBM  proQram  and  for  information  about  the  data 
base  during  the  move  and  Garbage  collection  functions  of  the 
DBM  Request  Processor.  Appendix  D  contains  a  description  of 


the  format  of  the  schema  description  file 


E*  Interprocess  Communication* 

The  schema  DBM  ana  the  user  process  communicate  via  the 
pipes  set  up  for  them  bv  dbm*  These  pipes  may  be  read  and 
written  just  as  if  they  were  ordinary  ooen  files.  Messages 
of  a  predefined  format  are  sent  and  received  by  both 
processes.  The  first  messaae  sent  is  the  initial  call  mes¬ 
sage  from  the  user  process*  This  message  is  triggered  by 
the  C  DML  "permit*’  function  and  contains  an  encoded  descrip¬ 
tion  of  the  sub-schema.  The  schema  DBM  response  to  the  ini¬ 
tial  call  includes  the  index  numbers  for  all  the  entities 
and  attributes  contained  in  the  sub-schema  description. 
Subseauent  user  program  messaoes  are  reauests  for  data 
retrieval  or  update  and  are  made  utilizing  the' index  numbers 
acquired  in  the  initial  call.  Since  the  schema  DBM  will 
receive  an  end  of  file  condition  when  trying  to  read  the 
interprocess  channel  after  user  termination#  no  indication 
need  be  given  to  the  schema  DBM  that  the  user  program  has 
terminated. 

Messages  sent  bv  the  schema  DBM  fall  into  two 
categories!  normal  responses  and  error  messages.  The  error 
codes  in  error  messaaes  correspond  to  those  used  by  the  C 
OML.  For  a  description  of  the  format  of  all  the  interpro¬ 
cess  messages  see  Appendix  E. 


as 


F.  Data  Base  Keys 


In  the  COOASYL  DB^’S  each  record  must  be  identified  by  a 
unique  value  called  its  data  base  key.  This  key  is  assigned 
when  the  record  is  created  and  remains  with  it  for  the  life 
of  the  record.  The  ability  to  map  a  record's  data  base  key 
to  the  record  in  a  quick  and  unambiguous  fashion  must  be 
provided  since  the  key  is  used  for  direct  access.  The  key's 
order  relative  to  all  other  keys  in  the  area  must  be  well 
defined  since  it  is  used  for  sequential  access.  However^ 
the  record  is  allowed  to  move  around  in  physical  media  space 
as  lonq  as  it  stays  within  the  same  area.  Section  III.G 
below  will  discuss  how  the  problem  of  satisfying  all  the 
criteria  for  data  base  keys  was  resolved. 

The  format  of  a  data  base  key  is  shown  in  Fig.  1.  This 
format  makes  possible  255  areas  (area  zero  is  the  null  area) 
each  containing  up  to  16»777,215  recoros. 

Bits:  31  2a  23  0 
Fields:  1  area  a  I  record  #  in  area  ! 


Data  Base  Key  Format. 
Figure  1  . 


^hen  a  record  is  first  created  and  assioned  to  an  area/ 
that  area's  inde*  number  becomes  part  of  its  data  base  key. 
It  is  not  Possible^  therefore/  for  a  record  to  migrate  to 


another  area.  The  record  number  is  a  ourely  logical  order¬ 
ing  and  is  implemented  as  described  below. 

G,  Area  Handl i ng. 

Each  area  specified  in  the  schema  has  associated  with  it 
two  files.  The  first  of  these  is  the  file  containing  the 
data  stored  in  the  area.  Its  name  is  the  same  as  the  name 
of  the  area.  The  second  file  is  the  data  base  key  file  for 
the  area.  Its  name  is  the  area  name  prefixed  by  "k.”. 

The  data  base  key  file  is  organized  into  24-bit  entries. 
Each  entry  contains  the  starting  byte  offset  into  the  area 
data  file  of  the  record  associated  with  a  particular  data 
base  key  for  that  area.  The  data  base  keys  are  mapped  to 
the  entries  sequentially.  That  is/  multiplying  the  record 
offset  portion  of  the  data  base  key  by  three  yields  the 
starting  byte  offset  -of  the  entry  in  the  key  file  associated 
with  that  data  base  key.  If  the  value  of  the  key  file  entry 
for  a  data  base  key  is  zero  then  that  key  is  null  (i.e. 
unassigned).  The  first  entry  slot  in  the  data  base  key  file 
for  each  area  is  reserved  for  storing  the  highest  used  key 
in  the  area  to  facilitate  sequential  searching  of  an  area. 
Thus  record  number  zero  is  undefined  in  each  area. 

Data  records  are  stored  in  the  data  file  with  a  three 
word  prefix.  The  first  three  bytes  of  the  prefix  contain 
the  record  number  portion  of  the  data  base  key  for  that 
record.  The  fourth  byte  contains  the  record  type  (a  total 
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255  tyoes  are  oossible).  The  last  two  bytes  of  the  pre¬ 
fix  contain  the  si?e  in  bytes  of  the  record  (maximum  record 
size  is  therefore  32767  bytes). 

As  records  are  created  and  added  to  the  arear  they  are 
entered  seouentially  in  the  area  data  file  follow inq  the 
last  record  written  (data  base  keys  may  be  assigned  by  any 
algorithmr  however).  The  records  will  remain  in  their  ori¬ 
ginal  locations  until  they  are  deleted  or  moved  during  gar¬ 
bage  collection.  If  a  record  is  moved  to  a  new  locationr 
its  key  file  entry  is  updated  accordingly.  Whenever  a 
record  is  moved  or  deleted  from  the  area  data  filer  the 
first  two  words  of  the  prefix  at  its  former  location  are 
zeroed . 

The  positioning  control  mechanism  provided  in  the  schema 
DDL  is  implemented  via  data  base  keys.  The  area  control  is 
handled  in  a  trivial  fashion  since  the  area  index  number  is 
a  part  of  the  data  base  key.  The  positioning  of  a  record 
"near”  another  record  is  accomplished  by  assigning  the 
record  being  added  the  next  available  data  base  key  follow¬ 
ing  the  data  base  key  of  the  record  it  is  to  be  "near". 
This  method  speeds  access  to  records  clustered  "near"  one 
another  when  they  are  used  in  conjunction  with  each  another 
since  their  data  base  key  file  entries  are  likely  to  be  in 
the  same  block.  Additionally/  the  oarbage  collection  func¬ 
tion  of  the  DRM  Request  Processor  automatically  re-seguences 
the  records  in  the  area  data  file  to  be  in  ascending  order 
of  data  base  key.  After  garbage  collection  the  records  are 
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clustered  in  the  area  data  file  as  well. 

If  a  data  base  kev  assianment  algorithm  causes  a  sparse 
key  space/  the  storage  needed  for  recording  the  kev  entries 
is  minimized  by  the  fact  that  UNIX  only  allocates  storage 
for  blocks  actually  accessed.  For  example/  if  a  data  base 
key  were  to  be  allocated  whose  key  entry  block  would  be  200 
blocks  beyond  the  current  end  of  the  data  base  key  file/ 
only  the  block  containing  that  entry  would  be  allocated. 
Even  though  the  apparent  size  of  the  file  would  have 
increased  by  200  blocks/  the  intervening  109  blocks  would 
not  be  assigned  any  physical  media  space.  Unfortunately/  if 
an  empty  block  is  read/  space  for  it  is  allocated.  This 
means  that  if  the  area  in  the  above  example  were  ever 
scanned  sequentially/  all  the  non-a 1 1 oc at ed  blocks  in  the 
data  base  kev  file  would  be  allocated. 

Due  to  the  deletion  and  addition  algorithms/  gaps  will 
develop  in  the  data  file  during  the  course  of  processing. 
The  total  size  (in  bytes)  of  these  gaps  is  maintained  in  the 
first  four  bytes  of  the  area  data  file.  During  the  execu¬ 
tion  of  the  schema  the  amount  of  wasted  space  is  accu¬ 
mulated  and  at  the  end  of  the  run  the  area  data  file  waste 
count  is  incremented.  Since  this  method  permits  more  than 
two  billion  waste  bytes  to  be  accumulated/  no  provision  is 
mage  for  overflowing  the  waste  count.  When  the  waste  count 
gets  to  an  unacceptable  size/  tha  DBM  Reouest  Processor  can 
be  used  to  effect  garbage  collection. 
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Areas  which  are  designated  as  temporary  areas  are  han¬ 
dled  in  a  slightly  different  manner.  Since  a  temporary  area 
is  local  to  the  user  process  opening  it»  the  file  names  for 
such  an  area  are  suffixed  with  the  process  id  of  the  user 
process.  Since  the  process  id  uniguely  identifies  the  pro- 
cessf  these  names  uniguely  identify  a  particular  version  of 
a  temporary  area.  Add i t i ona 1 1 y f  the  files  associated  with  a 
temporary  area  are  allocated  in  the  "tmp*'  directory  and  are 
deleted  when  the  process  is  terminated.  The  "tmp"  directory 
has  the  characteristic  that  if  a  system  crash  occurs^  the 
files  within  it  are  lost. 


The  files  associated  with  any  area  are  automatically 
created  by  the  schema  DBM  if  it  attempts  to  open  them  and 
they  do  not  exist.  This  means  that  when  a  schema  is  first 
created#  its  areas  will  come  into  being  automatically  as 
soon  as  they  are  needed. 


H.  Access  Methods. 

There  are  five  access  methods  which  may  be  used  for 
locating  a  record  in  the  data  base!  direct#  seouential# 
calculated#  chained  and  indexed. 


1.  Direct  Access. 

If  the  data  base  key  of  a  record  is  known#  it  may  be 
accessed  directlv  using  the  data  base  key  mapping  mechanism 
described  above.  Every  access  to  the  data  base  ultimately 
involves  direct  access  once  the  data  base  key  is  known. 
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Unadorned  direct  access  is  provided  to  the  user  through 
record  currency  and  through  explicit  key  record  selection 
expressions  (see  Section  IV  below). 

2.  Sequential  Area  Scan. 

The  **next”  and  ”  prior’’  records  in  an  area  are 
accessed  through  a  sequential  scan.  The  algorithm  succes¬ 
sively  increments  or  decrements  the  record  number  in  the 
current  data  base  kev  until  the  next  or  previous  non-null 
data  base  key  is  found. 

3.  Calculated  or  Hashed  Access. 

« 

A  data  base  key  may  be  develooed  by  a  hashing  algo¬ 
rithm  which  uses  data  in  a  record  for  a  hash  key.  The 
schema  record  entry  for  records  accessed  by  hashina  must 
have  a  location  mode  of  "CALC”.  During  record  creation^ 
CALC  key  collisions  are  resolved  by  a  forward  linear  scan 
until  a  null  kev  is  developed.  A  key  link  is  established  in 
the  synonym  record  to  enable  future  access  to  the  new 
record.  If  multiple  collisions  occur  on  the  same  data  base 
key»  a  linked  list  is  developed  leading  to  the  last  synonym 
added.  A  standard  schema  DBM  utility  routine  is  used  for 
hashing  ("randkey").  If  non-standard  hashing  for  a  record 
is  desired»  a  data  base  procedure  may  be  specified  in  the 
location  mode  clause  of  the  record's  schema  entry.  When  a 
record  of  a  type  using  a  non-standard  hashing  procedure  is 
to  be  added/  the  data  base  procedure  declared  in  the  loca¬ 


tion  mode  clause  for  the  record  will  be  called  to  provide 


X 


the  data  base  key 


4*  Chained  Access. 

The  default  method  of  set  linkage  is  via  chaining. 
The  links  in  a  chain  consist  of  data  base  keys  stored  in  the 
records  to  be  linked.  The  owner  record  of  a  set  contains 
links  to  the  first  and  last  member  records  in  the  set.  Each 
member  record  contains  a  link  to  the  next  record  in  the  set. 
If  a  set  is  defined  as  ’’PRIOR  PROCESSABLE”  in  the  schemar 
each  member  record  has  a  link  to  the  previous  member  of  the 
set.  A  member  record  defined  as  ’’LINKED  TO  OWNER”  in  the 
schema  will  contain  a  link  to  the  owner  record  of  the  set. 
The  link-to-next-record  in  the  last  record  of  a  set  and  the 
1 i n k-t o-pre V i ous-reco rd  in  the  first  record  of  a  set  both 
point  to  the  owner  record  of  the  set* 

5.  Indexed  Access. 

Sets  which  are  singular  or  dynamic  have  indices  as 
their  primary  access  method.  Additionally/  indices  are  used 
for  secondary  set  linkage  to  implement  ’’SEARCH”  keys  defined 
in  the  schema  for  a  record  type.  An  index  consists  of  a 
list  of  data  base  keys  ordered  as  specified  in  an  ’’ORDER”  or 
”SEARCH”  clause*  When  record  selection  is  through  set 
membership  with  data  field  values  soecified  (see  Apoendix  A/ 
Section  8.1. a)/  an  appropriate  index  will  be  used  to  gain 
access  to  the  record.  See  Section  III. I  below  for  a  further 
discussion  of  indices. 
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I.  The  Schema  Index  File. 

All  indices  created  in  the  data  base  are  stored  in  the 
schema  index  file.  The  name  of  this  file  is  the  schema  name 
prefixed  by  "index,".  Each  index  in  the  file  is  oraanized 
into  512  byte  blocks  each  of  which  has  the  format  shown  in 
Fig,  2 . 

St  ruc  t  i  b 1 oc  k  { 

int  iblink;  //  link  to  previous  index  block 

int  iflink;  //  link  to  next  index  block 

char  ientry  15081  ;  //  up  to  127  index  entries 

> 

Format  of  an  Index  Block, 

Figure  2 . 

When  the  backward  link  field  (iblink)  in  the  first  block 
of  an  index  is  zero ,  the  index  is  not  in  use.  Minus  one 
indicates  that  it  is  in  use.  The  forward  link  field 
(iflink)  of  the  last  block  in  the  index  is  zero.  The  entry 
array  (ientry)  contains  uo  to  127,  four  byte  data  base  keys. 
Null  entries  at  the  end  of  an  index  block  are  all  zeroes. 
When  an  index  is  first  created,  seven  empty  entry  slots  are 
left  at  the  end  of  each  block  for  future  growth. 

An  index  is  searched  by  a  modified  form  of  binary 
search.  When  a  record  is  to  be  located  via  the  index,  the 
first  index  block  is  retrieved  and  the  records  co r respond i no 
to  the  first  and  last  data  base  kevs  entered  in  the  block 
are  examined.  If  these  records  bracket  the  desired  record, 
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a  binary  search  is  conducted  throuqh  the  index  block  to  find 
the  desired  record.  If  the  desired  record  is  not  associated 
with  the  index  blocks  subseauent  index  blocks  are  read  and 
the  last  record  for  each  block  is  examined  to  determine  if 
it  brackets  the  desired  record.  When  the  correct  block  is 
found#  a  binary  search  of  that  block  is  used  to  find  the 
record.  For  an  index  with  k  blocks  havinq  an  average  of  n 
entries  in  each  block#  the  average  number  of  records  exam¬ 
ined  in  locating  a  record  is  aoprox i matel y  (k  /  2)  f  m# 
where  m  is  the  log  base  two  of  n. 

When  data  base  keys  are  added  to  the  last  block  of  an 
index#  a  new  last  index  block  is  created  whenever  six  or 
fewer  empty  slots  remain  in  the  block.  If  a  block  other 
than  the  last  block  overflows  while  a  data  base  key  is  being 
added#  a  new  block  is  inserted  into  the  sequence  of  blocks. 
The  last  seven  entries  of  the  old  block  are  copied  into  the 
new  block  and  the  new  data  base  key  entry  -is  added.  When¬ 
ever  the  last  key  remaining  in  a  block  is  deleted#  the  block 
is  removed  from  the  index  and  freed. 

Indices  are  used  for  set  linkage  as  well  as  to  facili¬ 
tate  the  maintenance  of  a  ’’NO  DUPLICATES'*  clause  refer  ing  to 
items  not  linked  in  another  way.  Indices  are  linked  to  the 
records  when  needed  by  storing  the  starting  block  number  of 
the  index  in  the  record. 

In  the  absence  of  P  and  V  ooerators  (Ref.  221#  an  index 
may  be  reserved  for  use  by  a  particular  copy  of  the  schema 
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DBM  in  the  followinq  tortuous  manner.  When  an  index  is  to 
be  accessed/  the  schema  DBM  attempts  to  create  a  file 
("indexdum")  in  the  schema  directory.  This  file  is  created 
with  a  mode  that  does  not  allow  writing/  therefore  should 
another  process  attempt  to  create  "indexdum"  while  it  is 
open/  the  attempt  will  fail.  If  a  create  fails/  it  is 
repeated  until  successful.  Once  the  "indexdum”  file  has 
been  created/  the  first  block  of  the  index  is  read  from  the 
index  file.  If  the  backward  link  field  of  the  first  block 
is  minus  one,  "indexdum"  is  closed  and  the  above  process  is 
repeated  until  the  backward  link  is  zero.  The  backward  link 
is  then  set  to  minus  one  and  the  block  written.  The  "index¬ 
dum”  file  is  then  destroyed.  If  an  index  is  not  available/ 
the  schema  DBM  releases  any  indices  allocated  in  order  to 
avoid  deadlocks. 

When  an  index  is  to  be  released/  the  backward  link  of 
the  first  index  block  is  set  to  zero  and  the  block  is 
rewritten.  Thus/  only  one  schema  O0M  can  use  a  particular 
index  at  any  given  time.  This  system  avoids  integrity  prob¬ 
lems  stemming  from  simultaneous  update  of  an  index  block  by 
two  different  cooies  of  the  schema  DBM, 

During  the  course  of  day”tO”day  ooerations/  indices  will 
be  created  as  well  as  discarded.  When  an  index  is  of  no 
further  use/  its  blocks  must  be  made  available  for  recylc- 
ing.  The  free  blocks  thus  created/  are  accounted  for  on  a 
free  block  list  similar  to  the  free  list  in  a  UNIX  super 
block.  Block  zero  of  the  schema  index  file  is  used  to 
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contain  the  free  list.  The  first  word  of  block  zero  con¬ 
tains  the  block  number  of  a  free  list  block  or  zero  if  none 
exist.  The  remaining  255  words  are  used  to  store  the  block 
numbers  of  free  blocks. 

The  free  list  is  maintained  as  follows.  When  a  block 
must  be  added  to  the  free  list^  its  block  number  is  stored 
in  the  first  available  slot  in  the  free  block.  If  all  the 
slots  are  taken^  block  zero  is  cooied  into  the  free  block# 
the  block  number  of  this  free  block  is  stored  in  the  first 
word  of  block  zero#  and  the  remainder  of  block  zero  is 
filled  with  zeroes.  If  a  free  block  must  be  allocated  to  an 
index#  the  last  non-null  block  number  is  extracted  from 
block  zero  and  the  slot  is  cleared.  If  no  free  blocks  are 
on  the  list  and  word  zero  contains  a  block  number#  that 
block  is  allocated  to  the  index  after  first  copying  its  con¬ 
tents  into  block  zero.  If  block  zero  is  all  zeroes#  a  new 
block  is  added  to  the  end  of  the  file  for  use  in  the  index. 
Other  schema  DBM  processes  are  locked  out  durina  block 
aouisition  and  freeino  by  the  same  mechanism  used  to  gain 
control  of  an  index. 

J.  Privacy. 

The  CODASYL  DBMS  design  allows  for  orivacy  locks  to  be 
established  at  all  levels.  It  allows  separate  orivacv  locks 
for  each  function  on  a  resource.  Additionally#  it  allows  a 
orivacy  lock  to  be  either  a  strina  to  be  matched  or  a  lock 
orocedure.  The  UNIX  implementation  features  all  these 
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options.  Theif*  i  up  1  e^'en  t  a  t  i  on  is  accomplished  as  follows. 

When  the  files  in  the  schema  directory  are  created/  UNIX 
establishes  the  installation's  Data  Base  Administrator  as 
their  owner.  By  making  the  access  privileges  of  a  file  read 
and  write  for  owner  only  (UNIX  function  "chmod")/  the  Data 
Base  Administrator  can  prohibit  all  other  users  in  the  sys¬ 
tem  from  opening  the  file.  Thus/  only  the  Data  Base 
Administrator  (or  super  user)  can  directly  read  or  write  a 
schema  file. 

The  DBM  Request  Processor  may  be  used  for  both  system 
maintenance  and  to  initiate  user  execution.  However/  since 
the  DBM  Request  Processor  does  not  set-user-id/  it  can  only 
perform  system  maintenance  functions  when  used  by  the  Data 
Base  Administrator. 

Since  the  schema  DBM  program  file  is  executable  by  any 
user/  the  DBM  Request  Processor  can  initiate  it  to  process 
user  reauests.  Since  the  schema  DBM  does  a  set-user-id  to 
the  user  id  of  the  Data  Base  Administrator/  it  can  access 
the  schema  files  as  reauired.  A  user  must  not  be  able  to 
penetrate  the  schema  DBM  to  gain  access  to  information  for 
which  he  does  not  have  the  privacy  keys. 

The  schema  DBM  prevents  unauthorized  access  using  the 
following  procedure.  The  schema  DBM  has  a  privacy  f 1 aq  for 
every  function/resource  pair  for  which  a  privacy  lock  can  be 
defined.  When  the  schema  DBM  validates  the  user's  sub¬ 
schema  (contained  in  the  initial  call  message)/  it  checks 
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the  privacy  keys  defined  in  the  sub-schema.  Each  privacy 
flag  for  which  no  lock  is  defined  or  for  which  the  sub¬ 
schema  privacy  key  is  validf  is  set  to  allow  access/  other¬ 
wise  it  is  set  to  deny  access.  The  user  receives  no  immedi¬ 
ate  indication  as  to  whether  or  not  his  privacy  keys  fit  the 
locks#  If  he  later  tries  to  access  some  data  base  resource 
in  a  way  for  which  he  did  not  furnish  acceptable  privacy 
keys/  his  request  fails#  Once  an  initial  response  message 
is  accepted  by  the  schema  DBMf  no  further  unlocking  of 
resources  can  be  done.  Thus/  in  order  to  access  the  denied 
resource/  the  user  proaram  must  be  terminated  and  restarted 
using  a  fresh  copy  of  the  schema  DBM  which  must  be  provided 
the  proper  orivacy  keys.  Thus/  no  single  execution  of  a 
user  program  can/  through  trial  and  error/  determine  the 
valid  privacy  keys. 

K#  Integrity. 

As  previously  mentioned/  the  CODASYL  DDLC  envisioned 
that  the  DBMS  routines  would  be  contained  in  a  single  pro¬ 
cess  which  would  service  all  users.  That  concept  guarantees 
the  integrity  of  the  data  base  since  simultaneous  update  of 
the  data  base  is  imoossible#  Additionally/  a  concent  called 
”keeD*^  status  is  included  in  the  COBOL  DML  (Ref#  33#  A 
record  has  automatic  ’’keep”  status  for  a  run-unit  during  the 
time  it  is  the  current  record  of  that  run-unit.  A  run-unit 
can  also  reauest  "keep”  status  for  a  record  if  it  desires  to 
be  informed  of  what  haooens  to  the  record.  If  a  run-unit 
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modifies  or  deletes  a  record  which  has  "keeo"  status  for 
another  run-unit»  the  run-unit  having  the  record  in  "keep" 
status  will  be  notified  of  the  action.  Although  the  "keep" 
mechanism  does  not  resolve  the  problem  of  concurrent  update^ 
it  does  provide  a  mechanism  for  identifying  potential  prob¬ 
lems.  "Keep"  status  allows  run-units  to  update  the  data 
base  while  still  allowing  access  to  it. 

Since  each  user  process  is  coupled  to  its  own  version  of 
the  schema  DBMf  none  of  the  above  features  can  be  readily 
implemented.  As  a  conseguence^  if  multiple  schema  DBM  pro¬ 
grams  concurrently  open  an  area  for  update/  data  base 
integrity  problems  are  virtually  assured.  If/  however/  all 
users  open  any  area  to  be  updated  for  protected  or  exclusive 
use  (see  Apoendix  A,  Section  B.2.b)/  no  integrity  problems 
can  arise. 

An  area  opened  for  protected  use  cannot  be  opened  by 
another  process  for  update.  An  area  ooened  for  exclusive 
use  cannot  be  opened  at  all  by  another  process.  The  mechan¬ 
ism  for  insuring  that  these  rules  are  enforced  is  the  Logi¬ 
cal  Usage  Block  File.  This  file  resides  in  the  "tmp"  direc¬ 
tory  and  has  the  same  name  as  the  schema  (hence  the  rule 
that  no  two  data  bases  may  have  same  name).  It  contains  the 
logical  usage  block  which  records  the  opening  mode  of  every 
area  currently  open  by  any  copy  of  the  schema  DBM.  When  a 
schema  DBM  desires  to  ooen  an  area/  it  reads  the  logical 
usage  block  and  determines  whether  or  not  a  conflict  exists 
between  the  opening  mode  it  desires  and  the  modes  in  use  by 
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other  processes  havinq  the  area  ooen*  If  no  conflict 
exists^  it  opens  the  area  and  updates  the  logical  usage 
block  accordingly?  otherwise^  it  notifies  the  user  that  the 
open  has  failed. 

In  order  to  avoid  problems  with  simultaneous  update  of 
the  logical  usage  block  bv  different  processes^  a  lock  out 
file  mechanism  similar  to  ”indexdum**  is  employed.  This  file 
is  named  "ooendum”  and  resides  in  the  schema  directory.  The 
^opendum”  file  is  created  to  lock  out  other  processes  while 
the  logical  usage  block  is  being  accesses. 

L.  The  Schema  DBM. 

As  mentioned  earlier^  the  DBMS  compiler  must 
schema  DBM  program  when  a  schema  is  compiled. 

DBM  program  is  composed  of  two  parts:  the  schema 
and  the  DBM  skeleton. 

1.  Schema  Constants. 

The  DBMS  compiler  must  produce  a  C  coded  temporary 
file  which  contains  all  the  schema  uniaue  constants  necces- 
sary  to  tailor  the  DBM  skeleton  to  the  schema  being  com¬ 
piled.  These  constants  cause  the  various  buffers  and  arrays 
used  by  the  DBM  proaram  to  be  allocated  sufficient  memory  to 
handle  the  schema.  For  a  descriotion  of  the  constants  and 

arrays  involvedseeAppendixH.  ^ 


produce  a 
The  schema 
const  ants 
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Add  i  t  i  ona  1  1  V  »  this  te'T'popary  file  must  include  the 
initialization  for  the  arrays  "procpoint"  and  "procname". 
These  arrays  contain,  respectively,  pointers  to  and  the 
names  of  all  the  data  base  procedures  in  the  schema.  When- 


ever  a  data  base  procedure 

name 

is  encountered 

during 

the 

Initialization 

phase  of 

the 

schema  DRM,  ” 

D  roc  name ” 

i  s 

searched  until 

the  matching 

name 

is  found  and  a 

po i n t  e  r 

t  o 

the  data  base  procedure  is  extracted  from  the  correspondi nq 
"procPoint"  entry. 

When  the  constant  file  has  been  generated,  the  DBMS 
compiler  can  use  the  C  compiler  to  form  the  schema  DBM  from 
the  constant  file  and  skeleton  DBM,  Since  pointers  to  the 
data  base  procedures  are  used  as  initializing  constants  in 
"procpoint",  all  the  data  base  procedures  will  automatically 
be  loaded  into  the  output  object  module  after  compilation. 
Both  the  DBM  skeleton  and  the  data  base  procedures  must 
exist  as  object  modules  available  to  the  C  compiler.  All 
the  external  arrays  which  are  dimensioned  in  the  constant 
file  are  declared  but  not  explicitly  dimensioned  in  the 
skeleton  DBM.  The  finished  product  of  the  C  compiler  is  an 
executable  schema  DBM, 

2,  The  DBM  Skeleton, 

The  DBM  skeleton  is  an  object  module  which  contains 
all  the  DBMS  routines  (exceot  data  base  procedures)  required 
to  provide  user  services  for  the  data  base.  Appendix  F  con¬ 
tains  a  complete  description  of  the  DBM  skeleton.  The  pro— 
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cessing  of  the  skeleton  (and  thus  the  schema  dbm)  is  divided 
into  two  phases:  initialization  and  user  reauest  processing, 

a.  Initialization  Phase. 

When  the  schema  DBM  is  called/  it  has  no  infor¬ 
mation  about  the  organization  of  the  schema  or  sub”schema. 
Although  all  its  buffers  are  the  right  size  and  all  the 
necessary  data  base  procedure  are  compiled  into  it/  it  has 
no  knowledge  of  data  base  names/  orivacy  locks/  set  rela¬ 
tionships  or  anv  other  data  peculiar  to  the  schema.  In 
order  to  function/  it  must  read  in  all  the  data  in  the 
schema  description  file.  Concurrently/  it  processes  the 
user  program's  sub-schema  which  is  oassed  in  the  user 
program's  initial  call  message.  By  validating  the  sub¬ 
schema  while  initializing  the  schema/  the  schema  DBM  can 
immediately  translate  all  references  into  terms  of  its 
internal  index  numbers  rather  than  store  data  base  names. 
This  avoids  much  of  the  matching  overhead  for  each  user 
reguest.  If  the  sub-schema  fails  the  validation/  the  schema 
DBM  sends  the  user  program  an  error  message/  returns  to  the 
beginning  of  the  schema  description  file  and  restarts  ini¬ 
tialization.  The  initialization  phase  will  thus  be  ter¬ 
minated  only  if  the  user  program  either  submits  a  valid 
sub-schema  or  terminates. 

b.  User  Reauest  Servicing  Phase. 

After  a  successful  initialization/  the  user 
reauest  servicing  phase  begins.  This  ohase  consists  of  a 
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loop  which  reads  a  user  messaqer  processes  itr  and  returns  a 
response  to  the  user  program.  The  loop  runs  until  user  pro¬ 
gram  termination/  at  which  point  the  schema  DBM  terminates. 


Processing  a  user  message  is  accomplished  by 
selecting  a  service  routine  based  on  the  message  type.  One 
service  routine  exists  for  each  message  type  except  the  ini¬ 
tial  call/  with  an  additional  routine  to  process  invalid 
message  types.  During  this  phase/  an  initial  call  is  con¬ 
sidered  an  invalid  message.  Each  service  routine  uses  one 
or  more  utility  routines.  Utility  routines  are  general  pur¬ 
pose  data  base  access  and  maintenance  primitives  which  may 
be  used  by  several  service  routines. 
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TV.  DESIGN  OF  THE  C  DDL  AND  DML 


A.  Design  Goals  and  Decisions. 

The  augmentation  to  the  C  language  was  designed  to  pro¬ 
vide  a  natural  interface  between  the  C  1 anguaae  and  the  DBHS 
without  reducing  its  ability  to  support  a  COBOL  DDL  and  DML. 
Accordingly,  the  C  DML  was  designed  to  have  as  much  func¬ 
tional  similarity  to  the  COBOL  DML  as  was  feasible.  This 
goal  was  adopted  to  support  the  research  objective  of  test¬ 
ing  CODASYL's  contention  that  the  DBMS  could  support  a 
variety  of  sub-schema  DDL's  and  DML's.  See  Section  C  below 
for  a  comparison  of  the  COBOL  and  C  DDL  and  DML. 

One  of  the  desirable  goals  of  a  DBMS  is  to  provide  pro¬ 
gram  independence  from  the  definition  of  the  data  base. 
Additionally,  one  of  the  orimary  design  philosophies  of  C 
was  economy  of  exoression.  In  order  to  facilitate  both  of 
these  goals,  the  C  DDL  provides  for  describing  only  a 
minimal  subset  of  the  relationshios  and  restrictions  which 
appear  within  a  schema  description.  The  DDL  is  restricted 
to  describing  the  names  and  orivacy  keys  for  areas,  records, 
data  items,  data  agaregates  and  sets?  and  the  membershio  of 
record  types  in  set  types.  Data  tyoes  of  items  and  aggre¬ 
gates  must  be  soecified  as  well,  but  this  information  may  be 
different  from  the  data  tvoes  recorded  in  the  schema 
description.  Additionally,  the  DDL  need  only  describe  those 


portions  of  the  schema  the  program  is  interested  in  maniou" 
lating.  Since  this  information  is  the  only  data  absolutely 
necessary  to  the  DVL»  unless  major  changes  affecting  the 
validity  of  the  program  logic  occur  in  the  schema/  a  recom¬ 
pilation  of  the  program  should  seldom  be  needed.  The  pro¬ 
grammer  obviously  needs  to  know  a  lot  more  about  the  schema/ 
however  he  can  (and  should)  obtain  this  information  from  an 
installation  Data  Element  Dictionary  (Ref.  191. 

A  fourth  goal  was  to  integrate  the  C  DDL  and  OML  into 
the  host  languaae's  structure  whenever  possible.  Accord¬ 
ingly/  the  DDL  was  grouped  into  a  soecial  external  function 
and  the  OML  functions  have  been  given  formats  similar  to 
other  C  special  functions  such  as  "return".  DML  logical 
expressions  are  comoatible  with  normal  C  expressions. 

B.  Major  Concents. 

This  section  describes  some  of  the  concepts  essential  to 
the  implementation  and  use  of  the  C  DDL  and  DML.  For  a  com- 
nlete  definition  of  the  C  DDL  and  DML/  see  Apoendix  A. 

1 .  Currency . 

The  concept  of  currency  is  central  to  the  navigating 
of  access  paths  in  the  DML.  The  user  process  as  well  as 
each  record/  set  and  area  tvne  known  to  it  have  a  current 
record  associated  with  them.  This  currency  is  established 
by  the  execution  of  a  "find".  When  a  record  is  found/  it 
becomes  the  current  record  of  the  orocess/  its  record  tyoe/ 
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the  area  in  which  it  resides  and  the  set  tyoe  of  every  set 
occurence  the  record  participates  in  as  a  member  or  owner. 
The  current  set  occurence  of  each  set  type  is  the  set  in 
which  the  current  record  of  that  set  type  participates. 

Information  about  a  records  including  data  values  if 
the  record  was  fetched  by  continues  to  be  available 
until  the  record  is  replaced  as  the  current  record  every¬ 
where  its  currency  was  originally  established.  For  example^ 
if  a  record  is  the  current  record  of  a  particular  area»  it 
will  remain  available  as  the  current  record  of  that  area 
until  a  "find"  is  executed  which  selects  a  different  record 
residing  in  the  same  area. 

2.  Find  versus  Get. 

The  difference  between  the  "find"  and  "get"  func¬ 
tions  is  that  the  former  locates  records  in  the  data  base 
while  the  latter  extracts  data  values  from  the  data  base. 
When  a  "find"  is  executed#  the  DBMS  spans  the  access  oaths 
specified  in  the  record  selection  criterion  and  returns  all 
the  information  about  a  record  necessary  to  make  it  current 
in  the  appropriate  places.  For  a  description  of  the  record 
selection  options  available#  see  Aooendix  A#  Section  B.l.b. 
The  information  necessary  to  establish  currency  includes  the 
selected  record's  data  base  key#  record  type  and  the  set 
types  for  the  sets  it  participates  in  as  a  member  or  owner. 
The  area  the  record  resides  in  can  be  derived  from  its  data 
base  key. 
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A  "qet"  is  used  to  access  the  data  values  associated 
with  a  record.  The  record  must  have  been  made  the  current 
record  of  the  process  prior  to  executing  the  "get”.  The 
values  of  the  record's  data  items  are  available  via  pointers 
associated  with  each  entity  for  which  the  record  is  current. 
Whether  an  implementor  elects  to  provide  separate  buffers 
for  each  currency  type  or  merely  reassign  the  value  of 
pointers  is  immaterial. 

3.  Independence  of  Schema  and  Sub-schema. 

A  user  program  can  be  compiled  without  reference  to 
the  schema  description.  A  program  usina  a  sub-schema  will 
continue  to  execute  until  the  data  base  name#  privacy  locks 
or  set  memberships  described  in  the  sub-schema  are  changed 
or  deleted  in  the  schema.  However#  changing  entries  such  as 
record  location  modes  and  set  selection  clauses  in  the 
schema  may  alter  the  loaic  of  a  program. 

The  data  types  of  the  sub-schema  may  differ  from 
those  of  the  schema.  The  DBMS  will  automatically  convert 
data  to  the  types  desired  by  the  sub-schema  before  delivery. 
Some  type  differences#  however#  may  cause  an  error  if  the 
data  involved  is  incompatable.  For  example#  converting  a 
string  of  characters  to  an  integer  will  fail  if  the  string 
contains  any  non-numeric  characters. 
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C.  Comoarison  with  the  COBOL  DDL  and  DML . 

The  comoapision  will  be  made  by  means  of  an  illustrative 
example.  For  a  detailed  description  of  the  C  and  COBOL  DDL 
and  OML  see  Appendix  A  and  Ref.  3  respectively.  The  COBOL 
portion  of  the  example  is  patterned  aftei*  Ref.  20,  Several 
modifications  were  made  to  reflect  both  recent  changes  in 
the  definitiort  of  the  COOASYL  DBMS  and  UNIX  implementation 
data  types. 

The  example  concerns  the  representation  of  a  personnel 
data  base.  Figure  3  is  a  diagram  of  the  network  and  records 
involved.  In  Fig.  1,  the  rectangles  represent  records  and 
the  arrows  point  from  set  owners  to  set  members.  Record 
names  are  written  above  rectangles^  item  values  within.  Set 
names  are  superimposed  over  the  set  linkage  arrows.  Multi¬ 
ple  agents  may  be  linked  to  an  assignment  and  multiple 
assignments  to  an  aoent.  The  fact  that  a  particular  agent 
is  assigned  to  a  particular  assignment  is  represented  by  the 
existence  of  a  LINK  record  which  has  membership  in  both  the 
AGENT-LINK  set  owned  by  that  agent's  AGENT  record  and  the 
ASS IGNMENT -L  I NK  set  owned  by  that  assianment's  ASSIGNMENT 
record.  The  link  records  are  made  necessary  by  the  restric¬ 
tion  that  a  record  may  only  be  a  member  of  one  set  of  a 
given  type. 

Figures  ^  and  5  show  the  DDL  description  of  the  schema. 
The  COBOL  sub-schema  could  be  essentially  an  exact  copy  of 
the  schema.  The  C  sub-schema  is  shown  in  Fig.  fe .  Note  that 
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an  agent's  number  is  reoresented  as  a  numeric  character 
string  in  the  schema  and  as  an  integer  in  the  C  sub”schema. 
In  the  C  sub-schema»  all  the  data  base  names  are  spelled 
with  lower  case  letters  and  with  hyohen  replaced  by 
underscore  ("«-**).  The  DBMS  translates  identifiers  to  allow 
for  this  difference  in  spelling  conventions.  In  the  UNIX 
i mp 1 emen t a t i on f  data  base  names  spelled  with  any  combina¬ 
tions  of  upper  and  lower  case  letters  will  be  properly 
r ecogn i zed . 


Department 


1  Function:  espionage  I 
!  Head:  M  I 

. I . 

deoartment-aoent 


Ass i gnment 

Agent  --------------------- 

----------Y-----------  I  Name:  Goldfinger  1 


!  Name :  1  over  I  •  ' 

{Level:  10  !  - 


Network  Reoresent at i on  of  a  Data  Base. 
Figure  3 . 
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SCHEMA  NAME  IS  PERSONNEL-FILE 


AREA  NAME 
AREA  NAME 


IS  DEPARTMENT-AREA. 
IS  ASSIGNMENT-AREA. 


RECORD  NAME  IS  DEPARTMENT. 

LOCATION  MODE  IS  CALC  USING  FUNCTION 
DUPLICATES  ARE  NOT  ALLOWED; 

WITHIN  department-area. 

01  FUNCTION;  PICTURE  IS  "AC20)". 

01  HEAD;  PICTURE  IS  "Ad)". 

RECORD  NAME  IS  AGENT; 

LOCATION  MODE  IS  CALC  USING  NUMBER; 

DUPLICATES  ARE  NOT  ALLOWED; 

WITHIN  DEPARTMENT-AREA. 

01  FIRST-NAME;  PICTURE  "AdO)". 

01  last-name;  picture  "acio)". 

01  number;  picture  "9(3)". 

record  name  is  skill; 

LOCATION  MODE  IS  VIA  AGENT-SKILL  SET; 

WITHIN  AREA  OF  OWNER. 

01  NAME;  PICTURE  IS  "A(20)". 

01  level;  type  is  fixed  decimal. 

RECORD  NAME  IS  ASSIGNMENT; 

LOCATION  MODE  IS  CALC  USING  NAME  OF  ASSIGNMENT 
WITHIN  AREA  OF  OWNER. 

01  name;  picture  is  "A(20)". 

RECORD  NAME  IS  LINK; 

LOCATION  MODE  IS  VIA  AGENT-LINK  SET; 

WITHIN  AREA  OF  OWNER. 


Schema  DDL  Area 


and  Record  Entries 


Figure  9 . 
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SET 


SET 


SET 


SET 


NAME  IS  department-agent; 

OViNER  IS  DEPARTMENT; 

ORDER  IS  permanent  INSERTION  IS 
SORTED  BY  DEFINED  KEYS; 

MEMBER  IS  AGENT  MANDATORY 

AUTOMATIC  LINKED  TO  OWNER; 

KEY  IS  ASCENDING  NUMBER; 

SET  SELECTION  IS  THRU  DEPARTMENT- AGENT  OWNER 
IDENTIFIED  BY  CURRENT  OF  SET, 

NAME  IS  agent-skill; 

OWNER  IS  agent; 

ORDER  IS  permanent  INSERTION  IS  SORTED  BY  DEFINED  KEYS 
MEMBER  IS  SKILL  MANDATORY  AUTOMATIC; 

KEY  IS  DESCENDING  LEVEL; 

SET  SELECTION  IS  THRU  AGENT  SKILL  OWNER 
IDENTIFIED  BY  CURRENT  OF  SET. 

NAME  IS  agent-link; 

OWNER  IS  AGENT; 

ORDER  IS  PERMANENT  INSERTION  IS  IMMATERIAL; 
member  is  link  mandatory  automatic  LINKED  TO  OWNER; 

SET  SELECTION  IS  THRU  AGENT-LINK  OWNER  IDENTIFIED 
BY  CALC-KEY  EQUAL  TO  CURRENT- AGENT . 

NAME  IS  assignment-link; 

OWNER  IS  assignment; 

ORDER  IS  PERMANENT  INSERTION  IS  IMMATERIAL; 

MEMBER  IS  LINK  MANDATORY  AUTOMATIC  LINKED  TO  OWNER; 

SET  SELECTION  IS  THRU  ASSIGNMENT-LINK  OWNER 
IDENTIFIED  BY  CALC-KEY  EQUAL  TO 
CURRENT-ASSIGNMENT; 


Schema  DDL  Set  Entries. 


F i au  re  5  . 
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ddl  { 


schema  oer sonne  1  f  i  1  e ? 
area  deoa  r  t  men  t <*a  r ea  ? 
area  ass  i  qnment«*area  » 
record  department{ 

char  f unc t i on  120]  ; 
char  head  fl 1 ; 

> 

record  aqent { 

char  f  i  r s t «-name  [  1  0 ]  ; 
char  1  ast<-name  1 1  01  ; 
int  number; 

} 

record  sk i 1 1 { 

char  namet20]; 
int  level; 

> 

record  assiqnmenti 
char  name[20]; 

} 

record  1 i nk { > 

} 

set  deoa  rtment<*aqent  owner  is  department  { 
member  aqent ; 

} 

set  aqent<-skill  owner  is  aqent  { 
member  skill; 

> 

set  aqent^-link  owner  is  aqenti 
membe  r  link; 

> 

set  ass  i  qnment  <•!  i  nk  owner  is  assiqnmenti 
member  link; 

> 

} 


C  Sub-schema  Entries. 
F i ou re  6 . 
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1 .  Query  I 


The  first  query  is  desianed  to  extract  the  skills  of 
agent  007,  The  procedure  is  to  initialize  the  agent  number/ 
FIND  the  agent  and  orint  the  agent  number/  skill  name  and 
skill  level  for  each  skill  the  agent  has  (if  any).  The 
COBOL  and  C  realizations  of  Query  1  are  shown  in  Fig.  7  and 
0  resoectively.  In  both  versions  access  to  the  agent  is  via 
the  CALC  key  in  the  AGENT  record  and  the  appropriate  AGENT- 
SKILL  set  is  automatically  selected  when  the  agent  007 
becomes  current.  The  agent  record  need  not  be  fetched  since 
none  of  its  data  fields  are  needed. 


OPEN  ALL. 

FINO-AGENT-RECORO. 

MOVE  *007'  TO  NUMBER  OF  AGENT. 

FIND  AGENT  RECORD. 

READ-FIRST-SKILL. 

FIND  FIRST  SKILL  RECORD  OF  AGENT-SKILL  SET. 

IF  ERROR-STATUS  =  0326  GO  TO  ALL-DONE. 

PRINT-SKILL. 

GET. 

DISPLAY  'AGENT  =  '/  NUMBER  OF  AGENT/  '/  SKILL  =  '/ 
NAME  OF  SKILL,  '/  LEVEL  =  LEVEL  OF  SKILL. 
READ-NEXT-SKILL. 

FIND  NEXT  SKILL  RECORD  OF  AGENT-SKILL  SET. 

IF  ERROR-STATUS  =  0307  GO  TO  ALL-DONE, 

ELSE  GO  TO  PRINT-SKILL. 

ALL-DONE. 


Query  1  Coded  in  COBOL. 
Figure  7 . 
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dbooen ( ) ; 
agent. number  =  7; 
f i ndCaqent ) ; 

f  or  (  f  i  nd  {  f  i  Pst  skill  of  aoen  t  ♦-s  k  i  1  1  )  ;  1  e  r  ro  r  .  s  t  a  t  us  ;  )  < 
get  ( ) ; 

orint  f  ("Agent  =  Skill  =  %s,  Level  =  r.s\n", 

agent. number»ski 1 1  .name»ski 1 1 .level )» 
find(next  skill  of  agent*-sk  i  1  1  ) ; 

} 

Query  1  Coded  in  C. 

F i ou  re  8  . 

2 .  Que  r y  2 . 

The  second  examole  Query  is  designed  to  find  all 
department  heads  concerned  with  the  assignment  "THUNDER- 
BALL".  The  procedure  is  as  follows.  FIND  the  ASSIGNMENT 
record  whose  NAME  is  "THUNDERBALL".  For  each  LINK  record  in 
the  assignment's  ASSIGNMENT-LINK  set» 

.  find  the  link's  owner  in  the  AGENT-LINK  set  it  belongs 

t  o  t 

.  find  that  AGENT  record's  owner  in  the  DEPARTMENT-AGENT 
set  it  belongs  to  and 

.  print  the  assignment  name  and  department  head. 

The  COBOL  and  C  realizations  of  Query  2  are  shown  in 
Fig.  9  and  10  respectively.  In  both  versions^  once  the 
desired  link  is  made  currents  the  appropriate  DEPARTMENT 
record  can  be  reached  via  the  AGENT-LINK  and  DEPARTMENT- 
AGENT  set.  Note  that  only  the  DEPARTMENT  records  need  to  be 
fetched  since  all  the  soannino  of  access  oaths  is  accom¬ 
plished  throuoh  currency  information. 
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OPEN  ALL. 

FINO-ASSIGNMENT-RECORD. 

MOVE  '  THUNDERBALL *  TO  NAME  OF  ASSIGNMENT. 

FIND  ASSIGNMENT  RECORD. 

FIND-FIRST-LINK. 

FIND  FIRST  LINK  RECORD  OF  ASSIGNMENT-LINK  SET. 
IF  error-status  =  0326  GO  TO  ALL-DONE. 
FIND-AGENT-LINK-OWNER. 

FIND  OWNER  RECORD  OF  AGENT-LINK  SET. 

RE AO-OEPARTMENT -RECORD. 

FIND  OWNER  RECORD  OF  DEPARTMENT -AGEN T  SET. 

GET. 

PRINT-DEPARTMENT -HE AO. 

display  ’ASSIGNMENT  =  NAME  OF  ASSIGNMENT, 
HEAD  =  HEAD  OF  DEPARTMENT. 
FIND-NEXT-LINK. 

FIND  NEXT  LINK  RECORD  OF  ASSIGNMENT  LINK  SET. 
IF  ERROR-STATUS  =  0307  GO  TO  ALL-DONE 
ELSE  GO  TO  FIND-AGENT-LINK-OWNER. 

ALL-DONE. 


Query  2  Coded  in  COBOL. 
Fiaure  9. 


dbooen ( ) ; 

for(  i  =  0;  i  <12?  i  t  +  Iassionfrient  .name  li  J  =  "THUNDERBALL'*  (i  ]  ; 
find(assianment); 

f  o  r  (  f  i  nd  (  f  i  rst  link  of  ass  i  anment  H  nk  )  ;  J  e  r  ro  r  .  s  t  a  t  us  ? )  { 
findlowner  of  aaent*-!  i  nk  )  ? 
findCowner  of  deoartment^-aqent  )  ? 
get  ( ) ; 

printff” Assignment  =  %s.  Head  =  %s\n", 
assi anment .name, department .head) ? 
find(ne*t  link  of  assiqnment^-link); 

} 


Query  2  Coded  in  C. 
F i qu  r e  10. 
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V.  CONCLUSIONS  AMD  RECOMEND A T I ONS  . 


A,  Conclusions. 

Since  the  DBMS  compiler  and  the  C  language  augmentation 
are  not  yet  implemented^  it  is  difficult  to  fully  evaluate 
the  effectiveness  and  efficiency  of  the  DBMS.  In  generals 
it  can  be  said  that  the  UNIX  file  system  seems  to  be  a  very 
hospitable  environment  for  developing  a  DBMSf  however  the 
operating  system  facilities  of  UNIX  are  not  nearly  as  well 
suited  to  supportina  this  development.  The  DBMS  is  measured 
against  some  of  the  goals  of  DBMS  as  they  are  presented  in 
Sec tion  II. A. ^1. 

1.  Concurrent  Retrieval  and  Update. 

The  DBMS  cannot  provide  the  ability  to  perform  con¬ 
current  update  of  the  same  area  by  two  users.  Although  the 
ability  to  open  an  area  for  unprotected  uoaate  exists^  its 
use  can  be  disasterous.  Concurrency  between  update  and 
retrieval  in  an  area  causes  no  intregrity  problems;  however, 
the  user  doing  retrievals  has  no  way  of  knowing  if  the 
records  he  is  accessing  are  being  modified. 

2.  A  Variety  of  Search  Strategies. 

The  DBMS  supports  every  form  of  access  path  speci¬ 
fied  by  the  CODASYL  DDLC.  These  forms  are  direct,  hashed. 
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sequential  and  indexed. 

3.  Centralized  Placement  Control. 

Placement  control  by  the  DBMS  is  a  ourely  loaical 
mapoing  with  the  UNIX  file  system  providing  centralized 
placement  control  for  the  data  onto  physical  media. 

il .  Device  Independence. 

Device  independence  is  almost  total  for  any  file  in 
UNIX.  The  DBMS  (and  therefore  the  user  program)  is  unaware 
of  either  the  types  or  number  of  devices  in  the  system. 

5.  Privacy  of  Data, 

The  complete  privacy  mechanism  in  the  CDDASYL  design 
has  been  implemented.  The  DBMS  itself  should  be  relatively 
secure.  A  program  could  be  written  to  call  the  schema  DBM 
repeatedly  and  determine  a  privacy  key  by  trial  and  errors 
but  using  data  base  procedure  privacy  locks  which  notify  a 
security  console  or  terminate  the  proaram  when  a  violation 
occurs  can  greatly  reduce  the  effectiveness  of  trial  and 
error  "lock  picking". 

UNIX  itselff  however^  is  too  easily  penetrated  IRef. 
21).  Locating  and  plugging  all  the  holes  in  UNIX  may  be 
i mpos  s i b 1 e . 
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6.  Independence  of  Schema  and  Sub-schema 


The  DBMS  provides  the  maximum  amount  of  independence 
possible  under  the  CODASYL  desion.  In  fact»  user  proarams 
could  be  compiled  without  any  reference  at  all  to  the 
schema  . 

B.  Recomendat i ons . 

1.  Enhancement  for  Concurrent  Update. 

In  order  to  enhance  the  ability  for  concurrent  use 
of  a  data  base»  the  following  approaches  might  be  taken. 

a.  Centralized  Schema  O0M. 

UNIX  could  be  modified  to  provide  a  mechanism 
for  establishing  interprocess  communication  to  any  desig¬ 
nated  process.  This  would  enable  implementation  of  a  cen¬ 
tralized  schema  DBM  as  the  CODASYL  OOLC  intended.  This 
alternative  remains  impractical  for  the  reasons  Section 
Ill.Cf  i.e.»  the  schema  DBM  would  run  out  of  file  resources. 
Add i t i ona 1 1 y »  the  UNIX  modification  would  have  an  unknown 
but  probably  major  impact  on  the  operating  system's  design. 

b.  System  P  and  V  Call. 

UNIX  could  be  modified  to  provide  a  system  call 
for  P  and  V  ooerators.  For  a  discussion  of  P  and  V  opera¬ 
tors  see  Ref.  22.  If  a  fast  P  and  V  facility  were  avail- 
able»  a  schema  DBM  could  temporarily  halt  all  update  or 
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access  f o  an  area  while  oerforminq  modifications.  "Keep" 
status  could  be  implemented/  if  desired/  by  storinq  indica¬ 
tors  in  the  record  itself.  The  impact  of  such  system  calls 
on  UNIX's  design  philosophy  is  expected  to  be  minimal. 

Additionally/  existing  communications  between 
schema  DBM  programs  could  be  speeded  up.  Specifically/  the 
methods  used  with  "ooenoum"  and  "indexdum"  to  lock  out 
simultaneous  update  are  essentially  a  "test  and  set"  opera¬ 
tion  which  could  be  implemented  more  efficiently  with  P  and 
V  system  calls. 

2.  Enhancement  for  Faster  Access. 

In  the  absence  of  usaoe  data/  it  is  difficult  to 
estimate  the  access  response  soeed  of  the  DBMS.  However/  a 
logical  extension  to  the  access  methods  provided  by  the  DBMS 
would  be  multilevel  indices.  The  index  structure  now  in  the 
DBMS  is  essentially  an  index  sequential  access  scheme  which 
could  be  upgraded  to  the  multilevel  structure  which  is  typi¬ 
cal  of  such  indices.  For  a  disscusion  of  multilevel  index 
sequential  access  method/  see  Ref.  13.  Use  of  a  two  leveled 
index  would  divide  the  average  number  of  records  scanned  to 
find  the  right  index  block  by  the  average  number  of  entries 
in  the  index  blocks. 

3.  Automatic  Garbage  Collection. 

Since  only  the  Data  Administrator  can  initiate  gar¬ 
bage  collection/  the  wasted  space  growth  rate  in  a  data  base 


may  become  a  problem.  Some  consideration  should  be  give  to 
having  the  schema  OBN^  automatically  garbage  collect  when  the 
waste  in  an  area  reaches  a  critical  level.  This  thesis  did 
not  address  the  problem  of  automatic  garbage  collection  due 
to  the  difficulty  of  determining  what  amount  of  wasted  space 
is  critical. 
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APPENDIX  A.  C  LANGUAGE  DDL  AND  DML . 

A,  C  Language  DDL. 

The  DDL  in  C  is  designed  to  interface  the  subschema 
description  with  the  schema  description  with  a  minimal 
requirment  for  path  information  from  the  user  and  maximal 
s i m i 1  a r i t V  w i t h  existing  C  language  constructs.  In  the  fol¬ 
lowing  discuss ion»  words  enclosed  in  apostrophes  denote 
variable  data.  When  a  'locic'  is  specified^  the  data  item 
must  be  of  tyoe  "character  pointer".  All  '  db  identifiers’ 
specified  must  match  the  appropriate  data  base  names  in  the 
schema  after  translation  of  lower  case  into  upper  case  and 
underscore  into  dash.  All  DDL  statements  are  enclosed  in  a 
"ddl"  routine  with  the  following  format: 

"ddl{...ddl  st at ement s  .  .  .  > "  . 

The  ddl  routine  should  appear  prior  to  any  DML  statements. 
It  may  be  contained  in  a  file  INCLUDE'd  at  an  appropriate 
point  in  the  program.  The  statements  in  order  of  appearance 
are  as  foil ows . 

1  .  Schema  Entry. 

The  "schema"  statement  identifies  the  schema  name 
and  its  privacy  lock.  Its  format  is 

"schema  ' db  identifier*  with  lock  'lock';". 

The  * db  identifier'  must  match  the  schema  name  and  the  lock 
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must  match  the  orivacy  lock  for  the  schema  entry  (see  Ref. 
2f  section  3.1.0). 

8.  Area  Entries. 

For  each  area  to  be  usedf  an  area  entry  must  be  made. 
These  entries  must  be  in  the  same  order  as  the  area  entries 
in  the  schema.  The  format  of  an  area  entry  is 

"area  ' db  identifier  list'  lock  is  'lock  list';"^ 
where  a  '  db  identifer  list'  is  one  or  more  comma  seoarated 
' db  identifier's.  A  lock  list  is  one  or'  more  comma 
separated  lock  entries  of  the  form 

"'lock'  for  'modifier*  'function* 
where  the  modifier  is  optional.  For  an  area  entry^  the 
allowable  modifiers  are  "exclusive"  and  "protected"  and  the 
allowable  fuctions  are  "uodate"  and  "retrieval".  The  'db 
identifier's  and  'lock's  must  match  the  area  names  and 
orivacy  locks  in  the  schema  area  entries  (see  Ref.  2,  sec¬ 
tion  3.2.0). 


1.  Record  Entries. 

Record  entries  must  be  made  in  the  same  order  as  the 
corresponding  record  entries  in  the  schema.  A  record  entry 
is  similar  in  construction  to  a  C  language  structure  defini¬ 
tion.  Its  format  is 

"record  'db  identifier'  lock  is  'lock  list*  ('item  1ist'}»"» 
where  the  "lock  is  'lock  list'"  ohrase  is  optional.  The 
'lock  list'  for  records  has  no  modifiers  defined.  The  func- 
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tions  allowed  are  "insert",  "remove",  "store",  "delete", 
"modify"  and  "find".  The  ' db  identifier'  and  'lock's  must 
correspond  to  those  of  the  schema  record  entry  (see  Ref.  2, 
sec  t i on  ^.2.3). 


The  item  list  in  a  record  entry  is  composed  of  a 
series  of  item  entries  of  the  following  form: 

"'type  specfier'  ' db  identifier'  ['constant  expres¬ 
sion']  lock  is  'lock  list';", 

where  the  "['constant  expression'!"  and  "lock  is  'lock 
list'"  phrases  are  optional.  A  type  specifier  is  of  the 
form  "int",  "char",  "float",  "double",  "dbkey"  or 
" s t rue t { ' i t em  list'!".  These  data  types  are  identical  to 
those  of  the  C  language  with  the  addition  of  "dbkey".  An 
item  of  tyoe  "dbkey"  appears  to  be  an  array  of  four  charac¬ 
ters  to  the  C  user.  The  'lock  list'  for  items  has  no  modif¬ 
iers  defined.  The  permissible  functions  are  "store",  "get" 
and  "modify".  The  item  entries  may  aopear  in  any  order  in 
the  item  entry  list  with  the  following  restrictions.  Items 
must  aopear  with  the  same  records  as  in  the  schema.  The 
data  tyoe  of  the  item  must  be  comoatable  with  the  schema 
item.  Items  of  type  "struct"  must  corresoona  to  repeating 
groups  in  the  schema  and  have  the  same  dimensionality  as  in 
the  schema.  Items  apoearing  in  a  repeating  group  in  the 
schema  must  aooear  in  the  item  list  of  the  structure 
corresponding  to  that  repeating  arouo.  Any  item  in  the 
schema  record  description  may  be  omitted. 
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The  record  names  can  be  used  in  non*0ML  statements 
as  structures  whose  format  is  identical  to  the  record  entry. 
These  record  structures  are  olobal  names  and  contain  the 
current  record  of  the  respective  type. 

2.  Set  Entries. 

For  each  set  to  be  referenced^  a  set  entry  of  the 
following  format  must  exist: 

"set  '  db  identifier'  lock  is  'lock  list'  owner  is 
' db  identifier'  *  '  i den t i f i e r '  {'member  1ist'>;"» 
where  the  "lock  is  'lock  list'"  ohrase  is  optional.  The 

'lock  list'  for  sets  has  no  modifiers.  The  defined  func* 
tions  are  "insert"/  "remove"  and  "find".  The  set  name  'db 
identifier'  and  'lock's  must  match  those  of  corresponding 
set  entry  in  the  schema  description  and  all  set  entries  must 
be  in  the  same  order  as  in  the  schema  description.  The 
second  'db  identifier'  must  match  the  owner  name  of  the  set. 
The  member  list  is  composed  of  one  or  more  member  entries. 

3.  Member  Entries. 

A  member  entry  has  the  form 

"member  ' db  identifier'  *  '  i dent i f i e r ' 
lock  is  'lock  list'/"/ 

where  the  "lock  is  'lock  list'"  phrase  is  optional.  The 
'lock  list'  for  members  has  no  modifiers  defined  and  the 
defined  functions  are  "insert"/  "remove"  and  "find".  The 
'db  identifier'  must  be  the  name  of  a  record  defined  in  the 
schema  as  a  member  record  for  the  set  being  described. 
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The  **identifier's  named  in  the  set  and  member 
entries  become  global  pointers  to  the  aopropriate  record 
structure.  These  pointers  can  be  used  to  reference  the 
current  owner  record  and  member  record^  respec t i ve I y »  of  the 
set.  In  addition  the  set  name  ’ db  identifier'  is  the  name 
of  a  character  array  which  holds  the  current  record  of  the 
set . 


C.  C  Language  DML . 

The  DML  has  several  global  names  and  functions  associ¬ 
ated  with  it.  Besides  the  records  item  and  set  names  from 
the  ddl  routine^  there  are  the  pointers  "areaname'’>  "rec- 
name"  and  the  structure  "error".  The  "areaname"  pointer 
contains  the  address  of  the  area  array  containing  the 
current  record  of  the  process.  The  "recname"  pointer  con¬ 
tains  the  address  of  the  record  structure  for  the  current 
record  of  the  process.  Note  the  "recname"  provides  the  user 
with  the  record  type  of  the  current  record  of  the  process; 
but  the  current  record  of  the  process  may  not  be  the  current 
record  of  that  type  and  therefore  the  record  pointed  to  may 
not  be  the  current  record  of  the  process.  The  current 
record  of  the  process  will  always  be  available  in  the  area 
array  pointed  to  by  "areaname".  "Areaname"  and  "recname" 
are  set  whenever  a  find  or  store  function  is  executed.  The 


"error"  vector  is  a  structure  with  the  following  format: 

St  rue  t  { 

int  status? 
i n  t  t  yoe  ? 
char  *setf 
char  *record? 
char  *area? 
i nt  count  ? 

>  error? 

The  error  codes  for  C 
patible  with  error 
OML.  Additionally/  a 
of  four  characters 
function. 


The  use  of  the 

DML 

causes  certain 

i dent i f i ers 

to  be  gen- 

e  r a  t  ed 

1  globally/ 

hence 

t  hese  should 

be 

treated 

as  reserved 

words 

by  the  user. 

These 

reserved  words 

are  : 

• 

all  a  rea /  record 

and  set  names 

• 

a  r eaname 

.  permit 

• 

remove 

• 

recname 

.  store 

• 

emo  t  y 

• 

error 

.  member 

• 

owner 

• 

dbooen 

.  current 

• 

duD 1 i c  a  t  e 

• 

dbc lose 

.  areaid 

• 

keyname 

• 

f  i  nd 

.  get 

« 

insert 

• 

mod i f  V 

.  key 

// 

pointer 

to 

^et  array 

for  error 

// 

pointer 

to 

record  for 

error 

// 

pointer 

to 

area  array 

for  error 

DML  functions  are  designed  to  be  com- 
codes  defined  in  Ref.  3  for  the  COBOL 
pointer  called  "areaid"  and  an  array 
called  "keyname"  exist  for  the  store 


1.  OML  Expressions. 

The  OML  introduces  two  additional  expression  tyoes 
into  C.  These  are  OML  lodical  expressions  and  OML  record 
selection  expressions. 

a,  OML  Logical  Expressions. 

These  expressions  evaluate  to  a  true/false  value 
and  can  be  used  in  a  manner  identical  to  normal  C  loaical 
expressions.  Their  forms  are 

(1)  " '  db  identifier'  emoty"»  where  ’ db  identif” 
ier’  must  appear  as  a  set  name  in  a  set  entry.  The  expres¬ 
sion  evaluates  true  if  and  only  if  the  current  set  of  the 
type  specified  has  no  members. 

(2)  "member  of  ' db  i dent i f i er ' " »  where  * db  iden¬ 
tifier'  must  be  a  set  name.  It  evaluates  true  if  and  only 
if  the  current  record  o'f  the  process  is  a  member  of  a  set  of 
the  type  specified. 

(3)  "owner  of  ' db  i den t i f i e r  '  " r  where  '  db  iden¬ 
tifier'  must  be  a  set  name.  It  evaluates  true  if  and  only 
if  the  current  record  of  the  orocess  is  the  owner  of  a  set 
of  the  tyoe  specified. 

b.  DML  Record  Selection  Exoressions. 

These  exoressions  result  in  a  data  base  key 
which  can  be  used  to  fino  a  record.  They  are  evaluated  in 
part  within  the  user  nrogram  hut  must  be  validated  by  the 
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schema  dbm  orogram.  They  must  aoDear  only  in  DML  function 
argument  lists.  The  forms  oossible  are  as  follows. 

(1)  Exolicit  Key.  The  simolest  form  of  record 
selection  expression  is  by  explict  key.  The  format  is 
"*key'".  The  'key'  must  be  either  an  item  of  tyoe  "dbkey" 
or  evaluate  to  a  character  oo inter.  The  contents  of  the 
'key*  are  used  as  a  data  base  key.  This  form  is  useful  for 
accessing  records  whose  keys  are  known.  It  can  also  be  used 
for  aoplyina  currency  which  has  previously  been  suppressed 
(e.q.  " f i nd ( key ( p roc  ess ) ) f "  applies  all  appropriate  currency 
to  the  current  record  of  the  process). 


(2)  Owner  Record.  Selection  of  an  owner  record 


has 
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db  i den  t i f  i 

er  ’  owner 

o  f 

'key 

where  the 

"of 

'key'" 

phrase  i 
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i den  t 1  tier’ 

is  a  set 
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key 
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If  the  " 

of  'key'" 

phrase  is 

not  used# 

the  owner 

record 

of 

the 

cur  ren  t 

i  nst  ance 

of  the  set  soecified  is  selected?  otherwise  the  owner  in  the 
set  tyoe  specified  for  the  record  identified  by  the  'key'  is 
selected. 


(3)  Relative  Selection.  This  form  allows  the 
selection  of  a  particular  record  from  an  area  or  set  based 
on  a  location  criterion.  The  expression  has  the  format 
"'criterion'  ' db  identifier'  of  ' db  identifier’".  The  first 
'db  identifier'  is  ootional  and  is  the  name  of  a  record 
tyoe.  The  second  ' db  identifier'  is  the  name  of  an  area  or 
set  type.  The  'criterion'  determines  the  location  within 


the  area  or  set  from  which  the  record  will  be  selected.  The 
allowed  criteria  are  "next"»  "orior",  '’first''f  "last"  and  an 
expression  which  evaluates  to  an  inteaer.  When  the  record 
type  is  included/  only  occurences  of  that  type  record  will 
be  considered  for  selection.  The  'criterion'  refers  to  the 
ordering  of  the  area  or  set.  The  ordering  of  an  area  is 
considered  to  be  ascending  seouence  by  data  base  key. 
"Next"  and  "prior"  are  relative  to  the  current  record  of  the 
area  or  set.  If  the  current  record  of  the  set  is  the  owner 
record/  "next"  and  "orior"  are  eouivalent  to  "Hrst"  and 
"last"  respectively. 

(y)  CALC  Key,  If  a  record  type  is  defined  in 
the  schema  as  having  a  location  mode  oif  CALC/  the  format 
"duplicate  ' db  identifier'"/  where  "duplicate"  is  optional/ 
may  be  used.  The  'db  identifier'  is  a  record  type  defined 
in  the  schema  to  have  location  mode  CALC.  Prior  to  the 
evaluation  of  the  record  selection  expression/  the  items  in 
the  record  designated  as  oart  of  the  CALC  key  must  have  been 
initialized  to  the  desired  values. 

If  the  "duplicate"  phrase  is  included/  the 
current  record  of  the  process  must  be  of  the  specified  type 
and  have  the  same  CALC  kev  as  in  the  record  buffer.  If 
these  conditions  are  satisfied/  a  synonym  to  the  current 
record  is  selected. 
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(5)  Data  Value.  Selection  by  data  value  is 
Dossible  using  the  format  "duolicate  'db  identifier'  via 
'set  select'  'db  identifier'  ==  'db  identifier'  where 

the  ohrase  "==  '  db  identifier'  and  the  word  "dupli* 

cate"  is  optional.  The  first  '  db  identifier'  is  a  record 
name?  the  seconds  a  set  name?  and  the  string  of  ' db 

•identifier's  is  made  up  of  items  in  the  named  record  type. 
The  'set  select'  ohrase  consists  of  either  the  word 

"current"  or  the  format  "  '  db  identifier'  ...  select".  The 
'db  identifier'  string  in  the  'set  select'  phrase  is  of  made 
UP  of  the  items  needed  for  the  selection  path  specified  in 
the  SELECTION  clause  for  the  the  named  record  type  and  named 
set  type. 

If  the  word  "duplicate"  is  omitted/  the 
expression  selects  the  first  record  occurence  in  the 
appropriate  set  with  values  matching  those  of  the  items  in 
the  string  of  'db  identifier's.  If  the  string  is  not  speci¬ 
fied/  the  first  record  of  the  named  type  in  the  set  is 
selected.  When  the  list  of  items  is  specified/  the  items  in 
the  list  must  have  been  initialized  to  the  desired  values 
prior  to  evaluation  of  the  record  selection  expression.  If 
"current"  is  included/  the  current  instance  of  the  named  set 
is  used/  otherwise  the  set  used  is  selected  on  the  basis  of 
the  selection  criteria  in  the  schema  for  the  named  record 
type  as  a  member  of  the  named  set  type. 

If  the  word  "duplicate"  is  included/ 
"current"  must  be  included  and  the  item  list  is  mandatory. 


The  record  selected  will  be  the  next  record  in  the  set 
match inq  the  current  record  of  the  process  in  the  fields 
named  in  the  item  name  strinq» 

2 •  DML  Rout i nes  . 

The  ability  to  access^  retrieve  and  update  records 
is  provided  by  DML  routines*  The  routines  are  divided  into 
area  manipulations  Cdbooen^  dbclose)^  record  manipulations 
(key  through  delete)  and  set  manipulations  (insert#  remove)* 
The  permit  function  does  not  fit  into  any  of  these 
categories*  Considerable  overlap  between  categories  exists 
among  the  other  functions.  All  functions  have  the  same  form 
as  normal  C  subroutine  calls-  In  the  following  description# 
all  error  codes  listed  have  a  two  decimal  digit  major  code 
and  a  two  decimal  digit  minor  code  specifying  the  function 
and  specific  error  respectively* 

a*  Permit*  The  permit  function  must  be  called  only 
once  and  must  be  before  any  other  DML  functions.  It  causes 
validation  of  the  subschema  by  the  schema  DBM  program  and 
establishment  of  the  privacy  permissions  reauired*  If  the 
schema  lock  is  violated#  an  error  code  of  0010  is  returned 
in  error*status*  If  any  other  privacy  lock  is  failed#  no 
indication  is  given  until  the  user  program  attempts  to  use 
the  feature  not  properly  unlocked.  If  a  mismatch  occurs 
between  the  schema  definition  and  the  subschema  definition# 
error  code  0060  will  be  returned  in  e r ro r  .  s t a t us  .  When  this 
occurs#  error. count  will  contain  the  number  of  i nc ompa t i b i 1 - 
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ities#  error^tyoe  will  be  ?  or  3  depending  on  whether  the 
first  error  encountered  was  in  an  area^  record  or  set  entry; 
error. area#  error.record  and  error. set  will  indicate  the 
first  erroneous  entry  in  the  area#  record  and  set  entries 
respectively.  The  entry  number  returned  identifies  entries 
in  the  C  subschema  and  is  zero  if  no  errors  were  encoun¬ 
tered.  If  any  other  OML  function  is  attempted  prior  to  per¬ 
mit#  error  code  nnbl  will  be  returned#  where  nn  indicates 
thefunctionattempted. 

b.  Dbooen.  Prior  to  processing  any  records  in  an 
area#  the  user  program  must  call  dbopen  to  open  the  area. 
Dbooen  parameters  are  an  opening  mode  and  a  list  of  area 
names.  The  opening  mode  is  an  octal  code  formed  as  follows. 
If  the  low  order  bit  is  1#  the  mode  is  for  update  and 
retrieval  otherwise  it  is  retrieval  only.  The  next  most 
significant  two  bits  are  zero  for  concurrent  update  per- 
mited#  1  for  concurrent  retrieval  but  no  concurrent  update 
(protected  mode)#  and  2  or  3  for  no  concurrent  use  permitted 
(exclusive  mode).  the  areas  in  the  parameter  list  are 
opened  in  the  specified  mode.  If  no  area  list  is  specified# 
all  the  areas  in  the  subschema  are  opened.  For  all  tem¬ 
porary  areas  opened#  a  mode  of  exclusive  update  is  assumed 
no  matter  what  mode  is  specified.  ^odes  allowing  concurrent 
processes  to  update  areas  are  included  for  comoa t ab i 1 i t v 
purposes#  however#  unless  the  implementation  of  the  data 
base  management  system  is  modified#  these  modes  can  cause 
severe  integrity  problems. 


To  successfully  execute  a  find^  stores  delete  or 
closer  approoriate  areas  must  be  ooen  as  follows:  all  areas 
which  contain  any  record  occurence  which  would  be  deleted  or 
removed  by  a  delete  statement  and  all  areas  which  are  the 
objective  of  a  close  function*  If  any  of  these  functions 
fails  to  meet  these  conditioner  error  status  nnOl  is 
returnedr  where  nn  indicates  the  function  attemoted* 

In  addition  to  the  areas  containinq  the  object 
records  of  the  functions  cited  abover  there  are  additional 
(i.e*  implicit)  areas  which  could  be  impacted  by  DML  func¬ 
tions*  This  impact  can  be  of  two  forms:  the  DBM  program 
requires  information  contained  within  the  implicit  area  (in 
which  case  the  area  must  be  "available’*)  or  the  DBM  program 
must  alter  the  information  contained  in  records  in  the 
implicict  area  fin  which  case  the  area  must  not  only  be 
available/  but  it  must  permit  the  necessary  alteration). 
Implicit  areas  reauirina  modification  are  termed  "affected"* 

A  user  may  assume  the  following  areas  will  be 
affected:  all  areas  containina  any  record  which  participates 
in  a  set  occurence  into  which  a  record  is  to  be  inserted  or 
from  which  a  record  is  to  be  removed  or  deleted  and  all 
areas  containinq  any  records  which  participate  in  any  set 
occurence  whose  membership  or  seauence  is  altered  by  a  store 
or  modify  function.  If  an  implicit  area  which  is  affected 
is  not  open/  error  code  nn21  will  be  returned  to 
e r ro r * s t a t us /  where  nn  indicates  the  function  attempted. 


To  successfully  execute  insertf  remove#  store# 
delete  or  modify  functions#  both  explicitly  and  implicitly 
affected  areas  involved  must  be  opened  for  update#  If  any 
of  the  involved  areas  are  open  for  retrieval  only#  an  error 
code  of  nn09  will  be  returned  to  error.status  where  nn  indi¬ 
cates  the  function  attempted. 

Record  occurences  which  are  in  the  search  path 
of  a  find  or  an  implicit  find  which  is  the  result  of  a 
store#  remove  or  delete  function  need  only  be  in  areas  which 
are  available#  In  order  for  an  area  to  be  available#  it 
must  not  be  opened  for  an  exclusive  mode  by  a  concurrent 
process#  Although  it  need  not  be  ooen#  the  full  overhead  of 
a  dbooen  and  dbclose  will  he  incurred  for  each  implicit 
reference  to  an  area  which  is  not  ooen.  If  an  implicit  area 
is  not  available#  error  code  nnl8  will  be  returned  to 
e r ro r . s t a t us #  where  nn  indicates  the  function  attempted# 

Any  attempt  to  execute  a  dbopen  function  which 
would  result  in  a  usage  mode  conflict  for  any  area  will 
result  in  the  failure  to  open  every  area#  Additionally# 
error  code  0929  will  be  returned  in  e r r o r . s t a t u s #  A  usage 
mode  conflict  will  occur  under  the  following  conditions:  any 
mode  of  update  on  an  area  opened  in  an  exclusive  or  pro¬ 
tected  mode  bv  another  process#  any  protected  mode  on  an 
area  opened  for  update  by  another  process?  exclusive  mode  on 
an  area  opened  for  any  mode  by  another  process#  and  any  mode 
on  an  area  ooened  for  exclusive  use  by  another  process#  In 
order  to  prevent  deadlock  conditions#  a  process  should  open 
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all  areas  needed  for  exclusive  or  orotected  use  in  one  dbo* 
pen.  If  a  dbopen  fails  because  of  usage  conflictr  the  pro¬ 
cess  should  close  any  other  ooen  areas  obtained  previously. 

If  a  privacy  lock  is  violated^  error  code  0910 
is  returned  in  e r ro r  .  s t a t us .  If  an  area  opened  was  already 
open/  warning  error  code  0928  is  returned  in  e r r o r  .  s t a t us . 
The  total  number  of  errors  encountered  is  returned  in 
error  .count  . 


c  •  Dbc lose. 

When  an  area  is  no  longer  needed  it  may  be 
released  for  use  by  other  processes  with  the  dbclose  func¬ 
tion.  Dbclose  parameters  are  a  list  of  area  names.  All  the 
areas  in  the  list  are  closed.  If  the  parameter  list  is 
omitted#  all  open  areas  are  closed.  After  the  dbclose  is 
executed#  all  current  records  in  closed  areas  cease  to  be 
current.  If  any  area  named  in  the  parameter  list  is  not 
open#  error  code  0101  is  returned  to  error. status  and 
error. count  will  contain  the  the  number  of  errors  detected. 

When  the  process  terminates  (even  abnormally)# 
no  dbclose  is  needed  as  all  areas  will  be  closed.  If  the 
dbclose  function  is  executed  on  a  temporary  area#  the  data 
within  the  area  is  not  lost  and  the  area  can  be  reopened  and 
processed.  When  the  process  terminates#  however#  all  tem¬ 
porary  areas#  ooen  or  closed#  are  lost. 
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The  find  function  allows  the  user  program  to 
select  a  record  from  the  data  base  and  make  i\t  the  current 
record  of  the  run  unit  and»  se 1 ec t i ve 1 y r  of  the  appropriate 
record  and  set  types.  The  parameters  are  a  record  selection 
expression  and  a  suppress  code.  The  record  selection 
expression  is  discussed  in  l.b  above.  The  suppress  code  is 
an  octal  code  whose  least  significant  bit  indicates  set 
supression  and  whose  next  least  significant  bit  indicates 
record  suppression.  If  set  suppression  is  indicated/  addi” 
tional  find  parameters  are  oermited/  each  of  which  is  the 
name  of  set  type. 

Execution  of  a  successful  find  function  causes 
the  selected  record  to  become  the  current  record  of  the  pro- 
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indicated.  the  object  record  does  not  become  current  for 
these  types.  When  the  list  of  set  names  is  included, 
currency  update  is  suppressed  only  in  the  named  sets.  After 
a  find,  the  data  fields  of  the  record  are  not  available,  but 
its  data  base  key  can  be  derived  (through  the  key  function), 
a  pointer  to  its  area  buffer  is  in  "areaname"  and  a  pointer 
to  the  appropriate  record  type  structure  is  in  "recname". 
The  record  can  now  be  retrieved  bv  the  get  function. 
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The  following  error  codes  may  be  returned  into 


error, status  by  a  find. 


0301 

The  sought  record  is  in  an  area  which  is  not  open. 

0310 

A  record  occurence  alona  the  search  oath  of  the 

find  is  in  an  area  under  the  exclusive  control  of 

another  orocess. 

0302 

A  data  base  key  was  supplied  or  developed  which  is 

incomoatable  with  the  areas  specified  for  a  record 

of  this  t  vpe  . 

0307 

An  end  of  area  or  end  of  set  condition  was 

detected. 

0326 

No  record  in  the  area  for  selection  through  CALC 

key  satisfies  the  record  selection  expression. 

0322 

Owner  record  selection  is  specified  and  the  data 

base  key  given  is  for  a  record  which  does  not  par¬ 
ticipate  in  a  set  of  the  desired  tyoe. 

0323 

Relative  selection  was  specified  and  the  specified 

record  cannot  be  in  the  desired  area. 

0310 

A  orivacy  breach  was  attempted. 

0361 

No  call  to  the  oermit  function  has  been  made. 

e , 

.  Get. 

The  get  function  is  used  to  transfer  the  data 

values  of  the  current  record  of  the  orocess  into  the  pro¬ 
cess'  buffers.  Its  parameters/  which  are  optional/  are  item 
names  from  some  record  type.  If  the  record  type  and  item 
names  are  specified/  only  the  items  named  are  extracted.  If 
the  item  names  are  not  soecified/  all  the  items  defined  in 
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the  subschema  for  that  record  are  extracted.  A  get  must  be 
executed  for  a  record  before  any  of  its  item  values  can  be 
examined. 


The  following  error  conditions  may  be  returned 
to  error.status  for  a  get. 

0513  The  current  record  for  the  orocess  is  unknown. 

0510  A  privacy  breach  was  attempted. 

0520  A  record  name  is  specified  and  the  current  record 
of  the  process  is  not  of  that  type. 

0561  No  call  to  the  permit  function  has  .been  made. 

055^  Truncation  of  significance  occured  during  conver¬ 
sion  from  the  schema  type  to  the  subschema  type 
for  an  item. 

In  all  but  the  last  case#  no  data  is  transfered  to  the  user 
process . 

f  .  St  ore . 

The  store  function  is  used  to  create  a  new 
record  occurence  in  the  data  base.  It  acquires  space  and  a 
data  base  key  for  a  new  record  occurence  in  the  data  base# 
causes  the  data  items  in  the  record's  buffer  to  be  used  in 
initializing  the  record#  inserts  the  record  into  all  sets  in 
which  it  is  an  automatic  member  and  establishes  a  new  set 
ocurrence  of  each  set  type  for  which  the  record  is  defined 
as  an  owner  in  the  schema. 

The  parameters  of  the  function  are  a  record 
name#  a  suppress  code  and  one  or  more  set  names.  The  record 
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name  specifies  the  record  type  to  be  created*  The  suppress 
code  and  set  names  are  exactly  analogous  to  those  of  the 
find  function.  In  order  for  the  store  to  function  properly^ 
the  subschema  must  include  the  following:  the  named  record? 
the  ”data-base-i dent i f i ers”  or  set  specified  in  the  "LOCA¬ 
TION”  mode  clause  of  the  record?  at  least  one  of  the  areas 
specified  in  the  within  clause  for  the  record?  all  sets  in 
which  the  record  is  defined  as  an  automatic  member?  and  all 
”dat a-base- i den t i n er s  "  r  records  and  sets  specified  or 
referenced  in  the  "SELECTION”  and  "KEY”  clauses  of  the  set 
member  subentries  in  which  the  record  is  defined  as 
automatic  (see  Ref.  2^  section  3*^.0), 

Prior  to  callina  stores  it  is  the  user  program’s 

« 

responsibility  to  insure  the  followina  is  done.  All  data 
items  in  the  record  type  buffer  must  be  initialized.  If 
multiple  areas  are  defined  in  the  "WITHIN"  clause  for  the 
record  with  the  "dat a-base-dat a-name- 1 "  option  (see  Ref. 2f 
Section  3 .  "5 . 0 )  and  the  "LOCATION"  mode  is  not  directs 
"areaid"  must  contain  the  desired  area  pointer.  If  the 
"LOCATION"  mode  is  direct  with  the  "da t a-base-da t a-name- I  " 
optionf  keyname  must  have  the  appropriate  data  base  key 
stored  in  it.  If  any  automatic  membership  has  a  "SELECTION" 
method  of  "THRU  CURRENT" ^  the  current  record  of  the  set  type 
must  specify  the  correct  set.  All  data  items  mentioned  in 
the  selection  clauses  of  the  member  entries  which  are 
automatic  for  the  record  and  all  data  items  mentioned  in  the 
"LOCATION"  clause  of  the  record  entry  must  be  initialized. 
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If  an  error  occurs  durinq  the  execution  of  a 
store/  the  new  record  is  not  created/  no  currency  indicators 
are  changed  and  an  error  code  is  returned  to  e r ro r  .  s t a t us  . 
The  following  errors  and  codes  can  be  encountered. 


1201 

The  object  record  is  to  be  stored  in  an  area  which 

is  not  ooen , 

1221 

A  record  occurence  which  is  affected  by  the  store 

function  is  in  an  area  which  is  not  open. 

1209 

The  object  record  of  the  store  or  some  record 

occurence  affected  by  the  store  is  in  an  area 

which  is  open  for  retrieval  only. 

1218 

Some  record  occurence  needed  by  the  store  for 

information  fe.g.  search  paths)  is  in  an  area 

which  is  not  available. 

1212 

No  data  base  keys  are  available. 

121  1 

No  media  space  is  available. 

1202 

A  data  base  key  passed  by  the  user  or  aenerated 

via  a  "CALC"  procedure  is  not  valid. 

1205 

The  record  would  violate  a  "DUPLICATES  NOT 

ALLOWED"  clause  defined  for  one  of  the  records  or 

sets  involved. 

1225 

For  one  of  the  set  types  involved  a  set  occurence 

cannot  be  matched  to  the  relevant  set  selection 

criteria. 

1210 

A  privacy  breach  was  attempted. 

1227 

A  check  clause  applies  and  one  of  the  data  items 

di d  not  pass . 

1223 

The  area  specified  for  the  record  is  not  one  of 
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those  in  the  record's  "WITHIN"  clause. 

122^  The  execution  0*^  the  store  statement  would  cause  a 
set  occurence  to  have  records  in  both  temporary 
and  permanent  areas. 

1219  The  value  of  an  item  cannot  be  converted  to  the 
tvpe  specified  in  the  schema  for  that  item. 

g.  Modi f y  . 

The  modify  function  enables  the  updating  of  some 
or  all  of  the  data  items  defined  in  the  sub-schema  for  a 
record  and  the  chanaing  of  set  occurences  in  which  a  record 
participates.  The  parameters#  which  are  all  optional#  are  a 
record  name#  a  list  of  items  in  the  record  and  set  parame¬ 
ters  identical  to  those  of  the  insert  function  (see  Section 
B.2.j).  If  the  items  are  not  specified#  then  every  item  in 
the  record  which  is  known  to  the  sub-schema  is  updated#  oth¬ 
erwise  only  the  named  items  are  updated.  If  the  set  names 
are  specified#  the  action  taken  is  eauivalent  to  a  remove 
function  followed  by  an  insert  function  for  the  named  sets 
with  the  followina  exceptions.  The  record  must  be  in  an 
occurence  of  every  set  named  prior  to  the  modify  function. 
The  set  membership  in  the  named  sets  can  be  defined  as  man¬ 
datory  or  automatic#  or  both. 

The  object  of  the  modify  is  the  current  record 
of  the  process.  All  data  items  to  be  updated  and  all  items 
required  for  an  insert  on  the  named  sets#  must  be  initial¬ 
ized  for  the  modify.  If  any  of  the  modified  data  items  are 


sort  control  items  for  a  set  occurence  in  which  membership 
is  retainedf  the  position  within  the  set  is  modified  accord¬ 
ingly.  If  any  of  the  items  changed  are  in  a  "SEARCH”  key 
clause^  the  index  is  updated.  The  record  becomes  the  current 
record  of  its  record  type  and  all  sets  it  has  membership  in. 

If  an  error  occurs  during  a  modify^  no  data  base 
or  currency  changes  are  made  and  an  error  code  is  returned 
to  e r ro r  .  s t a t us  .  The  possible  error  conditions  and  the 
associated  codes  are  as  follows. 


0803 

One  of  the  items  changed  is  in  a  CALC  key  and  the 

data  base  key  would  be  altered^  or  an  area  number 

specified  for  owner  record  selection  disagrees 

with  the  CALC  key  developed  for  the  owner. 

0825 

A  set  occurence  satisfying  the  specified  criteria 

was  not  found. 

0822 

The  record  is  not  currently  a  member  of  every 

spec i f i ed  set . 

0805 

The  insertion  of  the  record  into  a  set  occurence 

would  violate  a  "DUPLICATES  NOT  ALLOWED"  clause. 

0810 

A  privacy  breach  was  attempted. 

0827 

A  check  clause  was  failed. 

0821 

Some  record  occurence  affected  by  the  modify  is  in 

an  area  which  is  not  open. 

0821 

The  object  record  or  some  record  occurence 

affected  by  the  modify  is  in  an  area  which  is  not 

open  for  update. 

0818 

Some  record  occurence  which  is  implicitly  refer- 
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enced  is  in  an  area  open  for  exclusive  use  bv 
another  process* 

0819  A  modified  data  item  cannot  be  converted  into  the 
format  used  bv  the  schema  for  the  item* 

082^  Insertion  of  the  object  record  into  some  set 
occurence  would  cause  that  set  occurence  to  have 
members  in  both  temporary  and  permanent  areas* 

0861  No  call  to  the  permit  function  has  been  made* 

In  all  cases#  no  chanpe  is  made  to  the  data  base  or  to  the 
currency  indicators  of  the  p^rocess* 

h  *  Key  * 

The  key  function  allows  the  extraction  of  the 
data  base  key  for  one  of  the  current  records*  The  function 
needs  one  parameter  which  may  be  a  record#  set  or  area  name 
or  the  word  '^process"*  The  function  returns  a  pointer  to  a 
character  array  containina  the  data  base  key  for  the  current 
record  of  the  input  parameter.  The  key  should  be  treated  as 
read  only* 


If  an  error  occurs  during  the  kev  function#  a 
null  pointer  is  returned  and  the  error  code  is  returned  to 
e r r o r * s t a t u s  .  The  error  conditions  possible  are  no  current 
record  exists  for  the  input  parameter  passed  (code  130b)  and 
no  call  to  the  permit  function  (code  1361)* 
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Delete 


The  delete  function  Is  used  to  destroy  the 
current  record  of  the  processr  releasing  its  data  base  key 
and  storage/  and  to  selectively  delete  all  of  the  records 
which  are  members  of  set  occurences  owned  by  the  current 
record  of  the  run  unit#  The  function  reauires  a  single 
integer  parameter  in  the  range  zero  to  three  with  meaning  as 
follows#  A  zero  parameter  causes  deletion  of  the  record  if 
and  only  if  it  is  not  the  owner  of  any  non-empty  set 
occurences#  If  the  parameter  value  is  one/  the  record  is 
deleted/  all  optional  members  are  removed  from  its  set 
occurences  and  all  mandatory  members  of  its  set  occureneces 
are  deleted#  If  the  carameter  is  two/  the  action  is  identi¬ 
cal  to  that  of  one  exceot  that/  if  any  of  the  records  whose 
membership  is  optional  do  not  participate  in  set  occurences 
owned  by  a  different  record/  then  they  are  deleted  also.  If 
the  parameter  value  is  three/  then  the  record  and  every 
member  of  its  set  occurences  are  deleted.  For  any  member 
record  deleted/  the  deletion  of  the  member  records  in  that 
record's  sets  is  decided  as  if  that  record  were  the  object 
of  a  delete  function  with  an  identical  parameter  as  that  for 
the  originally  deleted  record. 

If  an  error  occurs  durina  the  function/  no 
records  are  removed  or  deleted  and  an  error  code  is  returned 
to  error. status.  The  oossible  errors  are  as  follows. 

0230  A  delete  with  parameter  zero  was  attempted  and  the 
record  owns  a  non-empty  set  occurence. 
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0213 


The  current  record  of  the  process  is  unknown* 

0210  A  privacy  breach  was  attempted. 

0221  One  of  the  affected  member  records  is  in  an  area 
which  is  not  open. 

0209  The  current  record  of  the  process  or  some  affected 
record  is  in  an  area  open  for  retrieval  only. 

0218  An  implicitly  referenced  record  is  in  an  area 
which  is  open  for  the  exclusive  use  of  another 
process  . 

0208  The  sub^schema  does  not  know  about  all  the  record 
types  which  would  be  deleted  or  removedr  or  all  of 
the  set  types  of  set  occurences  which  would  have 
records  removed. 

j.  Insert. 

This  function  causes  the  current  record  of  the 
process  to  become  a  member  of  an  occurence  of  the  specified 
set  types#  orovidina  it  is  defined  as  an  optional  automatic# 
optional  manual  or  mandatory  manual  member  of  those  sets. 
The  parameters#  which  are  optional#  are  a  record  type  and 
one  or  more  set  names.  Additional  parameters  may  follow  each 
set  name  deoendina  on  the  selection  criteria  for  the  member 
entry  of  the  object  record's  type.  If  the  root  set  in  the 
selection  oath  Kas  AT  A-B  ASE'-KEY  **  specified  with  the 
''da  t  a-base-dat  a-name- 1 option  (see  Ref.  2#  Section  3.^.0)# 
a  pointer  to  an  array  containing  a  data  base  key  or  an  item 
name  of  type  dbkey  must  he  included.  If  the  root  set  has 
the  "CALC-KEY**  option  with  the  **  da  t  a-b  a  se -da  t  a-name  s  ** 
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soecifiedr  item  names  or  Dointers  to  data  whose  type  matches 
that  of  the  corresponding  items  in  the  CALC  key  must  be 
included*  For  each  set  after  the  root  in  the  selection  path 
which  uses  the  "EQUAL  TO  da t a*base-da t a-name-^ "  option  (see 
Ref*  2r  Section  3*^*0)r  an  item  name  or  pointer  to  data 
which  matches  the  type  of  the  data  item  specified  in  the 
selection  clause  must  be  included*  In  addition  to  the 
explicit  parameters  abover  all  data  items  needed  in  the 
selection  path  as  specified  in  the  selection  clause  must  be 
specified*  If  the  owner  record’s  "WITHIN"  clause  specifies 
multiple  areasr  "areaid”  or  the  appropriate  data  item  must 
be  initialized  to  the  appropriate  area*  See  Reference  2f 
section  3*^*11  for  a  description  of  the  selection  clause* 
If  a  set  name  is  specified  with  no  additional  oarameterSf 
then  the  set  used  is  the  current  set  of  that  type* 

If  the  set  names  are  specifiedr  the  record  must 
not  be  in  an  occurence  of  any  of  the  named  set  type*  If  no 
set  names  are  specified^  the  record  is  inserted  into  the 
current  occurence  of  each  set  type  for  which  the  record  is 
defined  as  optional  automatic#  optional  manual  or  mandatory 
manual  provided  the  record  does  not  already  participate  in  a 
set  of  that  type*  After  the  insert#  the  record  becomes  the 
current  record  of  every  set  to  which  it  has  been  added* 

If  an  error  occurs  during  an  insert#  the  data 
base  remains  unchanged#  no  currency  indicators  change  and 
the  appropriate  error  code  is  returned  into  e r r o r  .  s t a t us . 
The  possible  error  conditions  are  as  follows* 


0713 

07ia 

0705 


0710 

0716 


0720 

0721 

0709 

0710 

072a 


0761 


The  current  record  of  the  process  is  unknown. 

Set  names  are  specified  and  the  record  is  not 
defined  as  an  optional  automatic*  optional  manual 
or  mandatory  manual  member  of  each  of  them. 

The  record*  when  inserted*  would  violate  a  "DUPLI¬ 
CATES  NOT  ALLOWED"  clause  for  some  record  or  set 
involved. 

The  current  record  of  some  set  name  specified  in  a 
"CURRENT"  clause  of  a  selection  entry  is  unknown. 
The  record  is  already  in  an  occurence  of  a  set 
explicitly  specified  or  of  every  set  implicitly 
soeci tied. 

The  record  type  was  passed  as  a  parameter  and 
disagrees  with  the  type  of  the  current  record  of 
the  process. 

A  record  occurence  which  is  affected  is  in  an  area 
which  is  not  open. 

The  record  inserted  or  some  affected  record  is  in 
an  area  which  is  ooen  for  retrieval  only. 

A  record  occurence  implicitly  referenced  by  the 
insert  is  in  an  area  which  is  not  available. 
Insertion  of  the  record  into  a  set  would  cause  the 
set  to  have  members  in  both  temporary  and  per¬ 
manent  areas . 

No  call  to  the  Permit  function  has  been  made. 
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k.  Remove 


This  function  is  used  to  cancel  the  membershio 
of  the  current  record  of  the  orocess  in  specified  set 
occurences  for  which  the  record's  membership  is  optional. 
The  parameterSf  which  are  oPtionaW  are  a  record  name  and 
one  or  more  set  names.  If  the  set  names  are  specified#  the 
object  record  must  participate  in  an  occurence  of  at  least 
one  of  them  and  its  membershio  in  each  of  them  is  canceled. 
If  no  set  names  are  specified#  every  optional  membership  in 
a  set  occurence  for  the  record  is  cancelled. 

If  an  error  occurs  during  the  remove#  no  set 
memberships  are  canceled#  no  currency  information  is 
affected  and  the  error  condition  is  returned  into 
error, status.  The  followina  errors  are  possible. 

1113  The  current  record  of  the  process  is  not  known, 

1120  A  record  type  parameter  was*  passed  and  it 
disagrees  with  that  of  the  current  record  of  the 
process  • 

1115  The  record  is  not  defined  as  an  optional  member  of 
any  named  set  type, 

1122  The  record  does  not  participate  in  at  least  one  of 
the  sets  named;  or  if  no  sets  are  named#  in  at 
least  one  of  the  possible  sets  for  which  it  is 
OPt i ona 1  . 

1110  A  privacy  breach  was  attempted, 

1121  Some  record  affected  by  the  remove  is  in  an  area 


which  is  not  open 


1109  The  current  record  of  the  process  or  some  affected 
record  is  in  an  area  which  is  not  open  for  update* 
1118  Some  implicitly  referenced  recora  is  in  an  area 
opened  for  exclusive  use  by  another  process* 

1161  No  call  to  the  permit  function  has  been  made* 
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APPENDIX  B.  files  ASSOCIATED  WITH  A  SCHEMA 


A,  Files  in  the  Schema  Directory. 


Most  of  the  files  associated  with  a  schema  are  contained 
in  a  directory  bear i no  the  name  of  the  schema.  This  direc¬ 
tory  becomes  the  current  directory  for  the  schema  DBM  pro¬ 
gram.  In  the  descriotion  of  the  files  within  the  schema 
directory^  the  term  "schemaname"  indicates  a  variable  por¬ 
tion  of  a  file  name  which  is  reolaced  by  the  name  of  the 
particular  schema  when  the  files  are  named. 


1.  Source  Descriotion  Fi'le. 

The  Source  Descriotion  File  contains  the  schema 
descriotion  in  the  source  CODASYL  DDL  form.  Its  name  is 
"s.schemaname" . 


2.  Encoded  Description  File. 

The  Encoded  Description  File  contains  the  comriled 
descriotion  of  the  schema.  It  contains  data  base  names  and 
encoded  descriptions  for  the  areas,  records  and  sets  in  the 
schema.  It  is  used  principally  by  the  schema  DBM  program  in 
the  initialization  orocess.  Its  name  is  "des.schemaname'*. 
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3.  Schema  DBM  Program 


The  schema  OB^"  oroqram  is  the  data  base  manager  for 
the  schema.  It  is  comprised  of  the  DBM  skeleton  routine 
compiled  together  with  any  data  base  orocedures  used  in  the 
schema.  Its  name  is  " dbm . sc hemaname " . 

y.  Schema  Library. 

This  file  is  optional  andf  when  presents  contains 
data  base  procedures  unioue  to  the  schema.  It  is  named 
"1  ib.schemaname". 

5.  Area  Data  Files. 

These  files  contain  the  data  for  all  the  defined 
areas  in  the  schema  which  are  not  designated  as  temoorary. 
Their  names  are  the  same  as  the  areas  which  they  represent. 

6.  Area  Data  Base  Key  Files. 

These  files  contain  the  byte  offsets  associatea  with 
each  data  base  key  for  thn  areas  which  are  not  designated  as 
temoorary.  The  files  are  named  by  orefixina  the  area  name 
by  "k.”. 

7.  Index  Block  File. 

This  file  provides  storaae  for  all  the  indices  used 
for  set  linkaoe  in  the  data  base.  It  is  called 
"index.schemaname". 


8.  Open  Lockout  File 


This  file  is  used  by  intear i tv  routines  to  lock  out 
other  orocesses  when  setting  up  exclusive  or  orotected 
access  priveleges  for  a  user  orocess.  It  is  created  to  ini¬ 
tiate  any  open  or  close  operation  and  removed  when  the 
operation  is  completed.  The  file  is  named  "opendum”. 

Index  Lockout  File. 

This  file  is  used  to  lock  out  other  processes  while 
attemoting  to  acQuire  an  index  from  the  index  block  file. 
It  is  handled  in  a  manner  analogous  to  the  open  lockout 
file.  The  file  is  name  "indexdum". 

10.  Message  Buffer  File. 

This  file  is  used  by  the  schema  08M  program  to 
assemble  messaaes  to  the  user  proaram  which  are  longer  the 
512  characters.  Prior  to  storing  a  new  message^  the  file  is 
truncated  to  zero  length. 


B.  Files  in  the  Temporary  Directory. 

Certain  files  for  a  schema  are  stored  in  the  UNIX  tem¬ 
porary  directory  C'Vtmp").  This  directory  has  the  charac¬ 
teristic  that  should  a  system  crash  occurs^  all  the  files 
contained  within  it  are  lost.  This  directory  is  used  to 
store  files  which  are  associated  with  the  running  of  a  pro” 
cess  and  therefore  should  be  lost  if  the  orocess  is  ter¬ 
minated  by  a  system  crash.  The  files  are  as  follows. 
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Area  Files 


These  files  are  the  area  data  files  and  area  data 
base  key  files  for  all  areas  designated  as  temporary.  The 
naming  conventions  for  these  files  are  identical  to  those 
for  non-t emoo r a ry  data  and  data  base  key  files  with  the  exe- 
ceotion  that  the  process  id  (pid)  of  their  user  process  is 
suffixed  to  the  name. 

2.  Logical  Usage  Block  File. 

This  file  contains  the  loaical  usage  block.  This 
block  is  used  during  open  and  close  operations  to  record  the 
usage  modes  for  the  various  areas  currently  in  use.  Its  name 


is  the  same  as  the  name  of  the  schema 


APPENDIX  C.  DBM  -  THE  DBM  REQUEST  PROCESSHR. 

I 


A,  Introduction. 

Dbm  is  a  simple  command  1 anguaae  orocessor  for  schema 
level  requests.  It  enables  the  data  administrator  to  per¬ 
form  such  functions  as  compilinq  a  schema^  moving  data  from 
schema  to  schema  and  Garbage  collection.  It  provides  the 
user  with  a  method  of  executing  a  orogram  to  utilize  the 
schema.  The  functions  of  ''ALTER'',  "DISPLAY"  and  "LOCKS" 
described  in  Ref.  2  are  provided  by  different  means. 
Namely,  the  UNIX  "ed"  and  "list"  functions  are  used,  with 
privacy  provided  by  the  file  access  privacy  of  UNIX.  The 
function  "COPY"  (for  subschema  use)  is  inapplicable  since 
the  C  language  DDL  is  not  a  proper  subset  of  the  schema  DDL 
as  was  the  case  with  COBOL.  In  addition,  the  cross  checking 
of  the  sub-schema  ano  the  schema  is  done  at  execution  time. 

Prior  to  usino  dbm,  the  schema  directory  must  exist 
(UNIX  function  "mkdir")  and  the  schema  source  file  should 
have  already  been  created  using  "ed",  the  UNIX  text  editor. 
Dbm  is  called  by  entering  "dbm  oathname",  where  the  pathname 
is  a  oath  endina  with  the  schema  name  of  the  schema  to  be 
used.  The  program  will  respond  in  one  of  two  ways:  it  will 
display  "cannot  access  schema"  or  ">".  The  first  response 
indicates  that  either  the  schema  specified  does  not  exist) 
access  privacy  prevents  access?  the  schema  is  not  a 
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directory;  or  a  file  called  **s  •  55chemaname”  does  not  exist  in 
the  directory^  where  '’schemaname"  is  the  name  of  the  schema. 
This  response  is  followed  by  immediate  termination  of  the 
proqram.  The  second  response  is  the  dbm  prompt  character 
and  means  that  dbm  is  ready  to  accept  commands. 

B  •  Commands . 

Upon  recieving  the  prompt  character^  the  user  has  the 
option  of  soecifyina  any  of  six  commands  as  follows. 

\ 

1 .  Como i 1 e . 

The  compile  command  causes  the  schema  to  be  com¬ 
piled.  The  command  format  is  ”c”  followed  by  a  carriage 
return.  The  compilation  process  causes  the  scanning  of  the 
schema  source  filer  *’s.schemaname”r  and  creation  of  the 
encoded  schema  description  file#  "des . schemaname” r  and  the 
schema  data  base  manaoerr  " dbm . sc hemaname ” •  If  the  necessary 
permissions  are  not  present  to  create  these  filesr  dbm 
displays  *’cannot  compile**.  If  errors  exist  in  the  source 
filer  they  are  displayed.  In  order  to  divert  the  error  listr 
an  optional  path  name  parameter  is  allowed  with  the  **c”  com¬ 
mand.  If  the  specified  Hie  can  be  openedr  the  error  listina 
is  output  to  itr  otherwise  an  error  message  is  displayed  at 
the  user*s  terminal.  V’lhen  the  *’c**  command  is  finishedr  the 
user  receives  a  prompt.  The  compiler  is  currently  a  stub. 
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2*  Move 


This  command  allows  data  to  be  moved  from  an  old 
version  of  a  data  base  to  the  current  one.  The  command  for¬ 
mat  is  ”m"  followed  bv  a  oath  to  a  schema  directory.  Move 
will  check  the  specified  schema  name  to  determine  if  it  is  a 
directory  containing  an  encoded  schema  description  and 
schema  data  base  manager.  If  the  schema  is  nonexistent  or 
i  naccessab  1  e  f  dbm  will  display  ’’schemaname  cannot  be 
accessed*^r  otherwise  the  data  from  the  designated  schema 
will  be  moved  to  the  current  schema.  The  data  moved  is 
selected  by  finding  all  area#  record  and  set  entries  with 
common  names  and  transfering  the  data  which  is  associated 
with  these  common  areas#  records  and  sets.  Area#  record  and 
set  entries  should  have  the  same  order  in  both  schemas.  All 
data  presently  in  the  current  schema  will  be  lost.  If  the 
move  is  unsuccessful#  move  oroduces  error  messages.  After 
the  move  is  completed#  the  user  receives  a  prompt.  The  move 
function  is  currently  a  stub. 

3 .  Execute . 

This  command  causes  the  execution  of  a  user  program 
to  access  the  data  base.  Its  format  is  ”  x followed  by  a 
path  to  a  user  program  and  the  arguments  for  that  user  pro¬ 
gram.  If  the  user  program  is  i naccessab 1 e #  nonexistent  or 
not  a  program#  dbm  orints  an  error  message  and  prompts. 
Otherwise  dbm  executes  the  user  program  ana#  upon  its  termi- 
na  t ion#  prompt  s . 
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4.  Garbaoe  Collection. 

This  command  allows  waste  comoression  in  area  data 
files  for  the  data  base.  The  format  is  "g"  followed  by  a 
carriage  return.  This  causes  the  followino  events  for  each 
area  in  the  schema.  The  message  "number  of  bytes  wasted  in 
areaname  is  NNN.  Collect?  (y  or  n)"  to  be  displayed/  where 
"areaname"  is  the  area  beina  processed  and  "NNN"  is  the  area 
waste  count.  Enterinc  "y"  causes  the  area  file  to  be 
recreated  with  all  records  written  in  ascending  order  of 
data  base  key  and  with  all  wasted  space  eliminated.  Enter¬ 
ing  "n"  causes  the  next  area  to  be  processed,  when  all  areas 
have  been  processed/  the  user  is  orompted. 

/ 

Due  to  the  lack  of  a  aarbaae  collection  facility  in 

the  schema  dbm  skeleton/  freauent  aarbage  collection  may  be 

\ 

necessary.  Note  aarbage  collection  causes  any  assianment  of 
data  base  keys  designed  to  juxtapose  related  records  to  be 
reflected  in  the  area  data  file  as  well. 

5 .  F  ree  . 

If  UNIX  crashes  durina  a  dbm  spawned  function/  cer¬ 
tain  files  may  be  left  in  a  state  makina  restart  impossible. 
The  command  "f"  followed  by  carriage  return  causes  this  con¬ 
dition  to  be  eliminated.  The  free  command  removes  the  Hies 
"opendum"  and  "indexdum"  from  the  schema  directory/  if  they 
exist/  and  scans  the  index  block  file/  " i ndex . schemaname" / 
freeing  any  locked  indices. 
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C.  Interorocess  Communication. 

Whenver  dbm  must  create  a  orocess^  it  uses  the  UNIX 
functions  "fork"  and  "exec"*  The  former  causes  a  comolete 
copy  of  the  current  process^  called  the  child^  to  be  created 
and  the  later  causes  the  current  Process  to  be  overlaid  and 
reolaced  by  the  proaram  specified.  Dbm  creates  children  as 
needed  to  do  its  work.  Whenever  a  need  may  arise  for  the 
children  to  communicate  with  each  others  dbm  creates  inter¬ 
process  communications  pipes.  These  pipes  appear  to  be  a 
pair  of  files#  called  the  ends#  each  with  an  open  file 
descriptor.  One  end  of  the  pipe  is  open  for  readinq  and  the 
other  for  writing.  Since  all  children  of  a  parent  executina 
a  pipe  call  have  the  pipe  open  also#  the  pipe  can  be  used 
for  passing  data  back  and  forth.  Certain  protocols  must  be 
observed#  however.  The  oioe  can  only  effectively  be  used 
for  one  way  transmission  since  there  is  no  protocol  for 
preventing  a  process  from  readina  its  own  tranmissions  back 
before  the  intended  receiver  has  a  chance  to  read  them.  The 
receiver  should  close  the  writing  end  of  the  oioe#  otherwise 
the  receiver  will  wait  forever  if  trying  to  read  the  pipe 
after  the  sending  process  has  terminated.  This  phenomenon  is 
caused  by  the  fact  that  the  process  reading  a  pipe  will  go 
into  wait  state  if  any  process#  including  the  process  doing 
the  read#  has  its  writing  end  of  the  pipe  ooen.  If  no  pro¬ 
cess  has  its  writing  end  open  (termination  automatically 
closes  all  of  a  process*  ooen  files  and  pipes)#  a  read  on 
the  pipe  will  return  an  end  of  file  condition. 
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APPENDIX  D.  SCHEMA  DESCRIPTION  FILE  FORMAT 


The  schema  description  file  contains  the  encoded  schema 
description.  The  file  is  used  by  the  dbm  move  command  and 
in  initializing  the  schema  DBM  program.  Its  format  is 
described  below. 

A .  Schema  Entry. 

The  schema  entry  is  headed  by  a  null  terminated  string 
containing  the  schema  name.  Next  is  a  privacy  lock  consist¬ 
ing  of  a  null  character  ^  if  no  privacy  lock  is  defined;  or  a 
one  character  type  followed  by  a  null  terminatated  string^ 
if  a  privacy  lock  is  defined.  Tf  the  lock  is  defined/  a 
lock  type  of  *’s”  indicates  that  the  string  is  a  lock  string 
and  indicates  a  lock  data  base  procedure  name. 

B.  Area  Entries. 

The  area  entries  are  ©receded  by  a  two  byte  number  which 
is  the  number  of  areas  areas  and  a  two  byte  maximum  record 
size.  Each  entry  contains  the  followina  items. 

The  area  name  is  a  null  terminated  string.  The  tem¬ 
porary  indicator  is  a  one  character  flag  which  is  equal  to 
one  for  temporary  areas  and  zero  otherwise. 
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Fourteen  data  base  procedure  names  are  stored  next#  The 
first  six  names  are  Procedures  to  be  called  when  the  open 
functions  for  retrieval#  protected  retrieval#  exclusive 
retrieval#  update#  protected  update#  and  exclusive  update# 
are.  executed  normally.  The  seventh  name  is  a  procedure  to 
be  executed  when  a  close  is  executed  normally#  The  final 
seven  names  are  procedures  co r respond i nq  to  the  first  seven# 
but  which  are  executed  when  errors  occur.  If  a  procedure  is 
not  specified  for  a  function#  a  null  string  will  appear  in 
the  file  at  the  appropriate  position. 

Following  the  data  base  procedure  names  are  the  six 
privacy  lock  entries#  These  locks  have  the  same  format  as 
the  schema  privacy  lock.  The  six  locks  apply  to  the  open 
function  for  retrieval#  protected  retrieval#  exclusive 
retrieval#  update#  protected  update  and  exclusive  update 
respec  t i ve 1 y  # 

C  .  Reco  rd  Entries# 

The  record  entries  include  information  Generated  by  the 
member  subentries  of  the  schema's  set  entries  as  well  as 
information  from  the  schema's  record  entries#  The  record 
entries  are  preceded  bv  a  one  bvte  number  indicating  the 
number  of  record  tynes  present.  Each  record  entry  contains 
the  following  data. 

The  record  name  is  a  null  terminated  string.  It  is  fol* 
lowed  by  a  two  byte  signed  integer  indicatina  record  size 
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for  records  o^  this  entry  tyoe.  A  one  byte  location  mode  is 
next*  Additional  location  mode  information  may  follow 
depending  on  the  mode?  for  modes  zero  and  seven/  no  addi* 
tional  information?  for  mode  one/  a  one  byte  record  index 
and  a  one  byte  item  index?  for  modes  two  and  three/  two  or 
more  one  byte  item  index  numbers  preceded  by  a  one  byte 
number  indicating  the  number  of  indices  present?  for  modes 
four  and  five/  a  null  terminated  string  naming  a  data  base 
procedure  and  two  or  more  one  byte  item  indices  preceded  by 
a  one  byte  number  indicating  the  number  of  indices  present? 
and  for  mode  six/  a  one  byte  set  index*  The  location  mode 
infromation  is  derived  form  the  LOCATION  clause  of  the 
record  entry  and  the  encoding  matches  that  in  *’r1ocmod”  of  a 
schema  DBM  record  vector. 

Following  the  location  information  is  the  area  data 
derived  from  the  record  type's  WITHIN  clause*  This  consists 
of  a  one  byte  option  code  and  area  specifications*  The  area 
specification  format  depends  on  the  option  code:  for  code 
zero/  a  one  byte  area  index?  for  code  one/  two  or  more  one 
bvte  area  indices  preceded  by  a  one  byte  number  indicating 
the  number  of  indices  present?  and  for  two/  no  further  data* 
The  encoding  of  the  WITHIN  information  matches  that  of 
''rarea*'  in  the  schema  DBM  record  vector. 

Fourteen  data  base  procedure  names  are  stored  next  in 
the  entries.  The  first  seven  are  procedures  to  be  called 
when  the  functions  of  "insert"/  "remove"/  "store"/  "delete"/ 
"modify"/  "find"  and  "get"/  are  executed  normally  on  records 
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of  this  entry's  type.  The  second  seven  data  base  procedures 
are  called  when  any  of  the  above  seven  functions  is  executed 
and  an  error  occurs.  If  no  orocedure  is  defined  in  the 
schema  for  a  function#  a  null  string  replaces  the  function. 

Following  the  data  base  procedure  names  are  seven 
privacy  lock  entries.  These  locks  have  the  same  format  as 
the  schema  orivacy  lock.  The  locks  apply  to  the  seven  func¬ 
tions  listed  in  the  previous  paragraph. 

1 .  Member  Data. 

Each  record  entry  has  zero  or  more  set  membership 
entries  followina  the  record  orivacy  locks.  These  entries 
are  preceded  by  a  one  byte  number  indicatina  the  number  of 
membership  entries  present.  The  member  entries  for  each 
record  type  appear  in  the  same  order  as  the  set  entries  in 
the  schema  for  which  membership  is  defined.  The  contents  of 
each  membership  entry  is  as  follows. 

The  set  name  for  the  membership  is  stored  as  a’  null 
terminated  strina.  Following  the  set  name  is  a  two  byte 
series  of  flag  bytes  which  correspond  to  the  bits  of 
''mflags”  in  a  schema  DBM  member  vector.  The  information 
contained  in  these  bits  is  derived  from  the  MEMBER  clause# 
the  KEY  clause  and  the  total  number  of  SEARCH  clauses 
defined  in  the  schema. 

Next  is  a  one  byte  number  indicating  the  number  of 
items  included  in  the  primary  key  for  the  item.  This  number 
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is  at  most  t6  and  is  zero  if  no  key  is  defined*  Following 
this  value  is  the  appropriate  number  of  orimary  key  element 
pairs*  Each  key  oair  consists  of  a  one  byte  collatina  code 
and  an  item  index*  The  collating  code  is  zero  if  this  ele¬ 
ment  of  the  primary  key  is  ascending  and  one  if  it  is  des- 
cendi ng* 


Following  the  primary  key  specification  are  up  to 
seven  search  key  springs  (^he  exact  number  is  recorded  in 
the  flag  bytes  above)*  Each  search  key  string  is  a  null 
terminated  strino  of  the  item  indices  for  the  items  in  the 
search  key* 

Following  the  search  key  strings  is  the  set  selec¬ 
tion  data*  If  Format  2  of  the  SELECTION  clause  was  used# 
this  data  consists  of  the  name  of  a  data  base  procedure*  If 
Format  1  was  used#  the  data  is  as  follows*  First  is  a  one 
byte  code  indicatina  the  root  selection  mode*  The  code 
corresponds  to  that  in  "mselflag**  of  a  schema  DBM  member 
vector*  The  remaining  root  selection  data  depends  on  the 
root  selection  mode*  For  mode  one  and  two#  a  one  byte  set 
index  follows  the  mode*  For  mode  three#  there  is  no  further 
root  selection  data*  For  mode  four#  the  data  is  a  null  tem- 
inated  string  of  two  byte  Pairs  each  of  which  contains  a 
record  index  and  a  set  index* 

The  remaining  set  selection  data  for  Format  I  con¬ 
sists  of  the  number  of  "THEM  THRU"  clauses  followed  by  the 
appropriate  number  of  two  byte  selection  pairs*  Each  pair 
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contains  a  set  index  and  an  item  index*  These  pairs  are  the 
source  data  for  ”msse1”  in  a  schema  DBM  member  vector. 

Following  the  set  selection  information  are  six  data 
base  orocedure  names.  The  first  three  orocedures  are  called 
when  the  functions  of  "insert "remove”  and  "find”  are  exe* 
cuted  normally.  The  second  three  procedures  are  called  when 
these  same  functions  are  executed  and  an  error  results*  If 
no  data  base  orocedure  is  defined  for  a  oarticular  function^ 
a  null  string  will  appear  in  its  oosition. 

Following  the  data  base  orocedure  names  are  three 
privacy  lock  entries  in  the  same  format  as  the  schema  lock. 
These  privacy  locks  are  for  the  functions  described  in  the 
previous  paragraph, 

2,  Item  Data, 

Each  record  entry  has  one  or  more  item  descriotion 
entries  following  the  set  membership  entries.  The  item 
entries  are  preceded  by  a  one  byte  number  indicating  the 
number  of  items  present.  The  item  entries  are  stored  in  the 
same  order  that  the  items  they  represent  apoear  in  the 
record  tyoe  being  described.  The  contents  of  each  item 
entry  is  as  follows. 

The  first  data  in  an  item  entry  is  the  name  of  the 
item  stored  as  a  null  terminated  string.  If  the  item  is  one 
that  is  not  generated  by  an  item  sub-entrv  in  the  schema# 
the  item  name  will  be  a  null  string.  Following  the  name  is 
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a  one  byte  level  number,  A  level  number  between  one  and  100 
is  generated  bv  a  schema  item  sub-entry;  101  is  a  forward 
link;  102  is  a  backward  link;  103  is  a  link  to  owner;  lOU  is 
an  owner's  link  to  first  member;  105  is  an  owner's  link  to 
last  member;  106  is  an  owner's  link  to  index  and  107  is  a 
CALC  synonym  link. 

V 

The  remaining  data  deoends  on  the  level  number.  If 
the  level  number  is  between  one  and  100»  inclusive#  the  next 
byte  contains  the  data  tyoe;  between  101  and  106#  inclusive# 
the  next  byte"  is  the  set  index#  and  for  107  no  other  data  is 
needed.  If  a  picture  is  defined  for  tne  item#  it  is  stored 
next  as  a  null  terminated  string. 

Following  the  level  and  type  data  is  a  validity 
checking  description.  The  validity  checking  description  is 
a  null  terminated  strina  which  is  encoded  to  fit  the 
requirements  of  "icheck"  in  a  schema  OBM  item  vector.  It  is 
generated  bv  the  CHECK  clause  of  a  schema  item  sub-entry. 
If  no  validity  check  is  defined  for  the  item#  the  string  is 
null. 


Following  the  validity  check  description  are  three# 
two  byte  numbers  representing  the  size  (in  bytes)  of  one 
occurence  of  the  item;  tne  number  o-f  occurences  of  the  item 
in  a  record;  and  the  starting  byte  number  of  the  item  within 
the  record. 

The  names  of  six  data  base  orocedures  are  next.  The 
first  three  are  names  of  procedures  to  be  called  when  the 
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functions  of  "store''>  "oet"f  and  "modify'*  are  executed  nor 


The  second  three  procedures  are  called  when  these 
functions  are  executed  and  an  error  occurs.  If  a  procedure 
is  not  defined  for  a  function,  a  null  string  will  appear  in 
its  place. 

Following  the  data  base  procedure  names  are  three 
privacy  lock  entries  in  the  same  format  as  the  schema  lock. 
These  locks  apply  to  the  functions  mentioned  in  the  previous 
paragraph. 


D .  Set  Entries, 

Following  the  record  entries  are  the  set  entries.  These 
entries  are  generated  by  set  sub-entries  in  the  schema.  The 
entries  are  preceded  by  a  one  byte  number  indicating  the 
number  of  sets  defined.  The  contents  of  each  entry  is  as 
follows. 

The  first  element  in  each  entry  is  the  set  name  stored 
as  a  null  terminated  string.  The  set  name  is  followed  by  a 
one  byte  code  which  corresponds  to  the  lower  order  byte  of 
"sflags"  in  a  schema  DBM  vector  and  describes  OWNER,  SET  IS 
and  ORDER  clauses  of  the  set  sub-entry. 

Next  is  a  pair  of  bytes  indicating  the  owner  record. 
The  first  byte  is  the  owner  record's  index  and  the  second  is 
the  item  index  of  the  first  item  in  the  owner  record  having 
to  do  with  the  set.  Following  the  owner  record  data  is  a 
one  byte  number  inaicating  the  number  o-^  member  records  and 


129 


three  byte 


member  descriptions  oresent 


Each  member 


description  consists  of  the  record  index  of  the  member;  the 
index  of  the  set  membershio  vector  in  the  schema  DBM  record 
vector  for  the  record?  and  the  item  index  of  the  first  item 
in  the  record  having  to  do  with  this  set*  The  order  of  the 
member  descriptions  is  alphabetical  by  member  record  name* 

Following  the  member  descriptions  are  four  data  base 
procedure  names  stored  as  null  terminated  strings*  The 
first  two  are  names  of  procedures  to  be  called  when-  the 
*^insert”  or  ’’remove”  functions  are  executed  normally*  The 
last  two  represent  the  same  functions^  but  are  called  when 
an  error  occurs*  If  a  procedure  is  not  defined  in  the 
schema  for  a  function#  a  null  string  appears  in  that  place* 

Following  the  data  base  procedure  names  are  three 
privacy  locks*  These  locks  are  of  the  usual  format*  They 
lock  the  functions  of  ’’insert”#  ’’remove”  and  ’’find”#  respec¬ 
tively* 
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appendix  e.  interprocess  message  formats 


A,  Messaoes  Received  by  the  Schema  DBM, 

Messages  received  are  read  into  "^smesin”#  a  character 
buffer  of  lenath  512.  The  first  byte  of  the  messaae  is  a 
function  code.  The  remainder  of  the  message  will  vary 
depending  on  the  function  code.  The  message  is  terminated 
by  a  markr  which  is  ten  bvtes  of  the  octal  code  0252.  In 
the  message  descriptions  that  followr  the  function  code  is 
included  as  part  of  the  description  heading, 

1,  Initial  Call  Message  (Code  0), 

The  initial  call  is  made  by  the  user  to  reauest 
validation  of  his  sub-schema  and  to  establish  his  access 
permissions.  Immediately  following  the  function  code  is  a 
null  terminated  string  containing  the  schema  name.  After 
the  schema  name  is  a  null  terminated  string  containing  the 
privacy  key  for  the  schema.  Following  the  schema  entries 
are  the  area  entries. 

a.  Area  Entries. 

The  area  entries  are  preceded  bv  a  one  byte 
number  indicating  the  number  of  areas  in  the  sub-schema. 
Each  entry  consists  of  seven  null  terminated  strings.  The 
first  string  is  the  area  namer  the  other  six  strings  are 
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privacy  keys  and  mav  be  null  strings.  The  orivacy  keys 
specified  are  for  retrieval/  protected  retrieval/  exclusive 
retrieval/  update/  protected  update  and  exclusive  update 
respectively. 

b.  i^ecord  Entries. 

The  record  entries  are  preceded  by  a  one  byte 
number  indicating  the  number  of  records  in  the  sub-schema. 
Each  entry  consists  of  seven  null  terminated  strinas/  an 
encoded  member  list  and  an  encoded  item  list.  The  first 
string  is  the  record  name  and  the  remaining  six  are  orivacy 
keys  and  may  be  null  strings.  The  privacy  keys  are  for 
insert/  remove/  store/  delete/  modify  and  find  respectively. 

An  encoded  member  list  is  headed  by  a  one  byte 
number  indicatina  how  many  member  entries  follow.  Each 
member  entry  consists  of  four  null  terminated  strinas.  The 
first  string  is  the  name  of  the  set  and  the  remaining  three 
are  privacy  keys  and  may  be  null.  The  privacy  keys  are  for 
insert/  remove  and  find  respectively. 

An  encoded  item  list  is  headed  by  a  one  bvte 
number  indicating  how  many  item  entries  are  in  the  list. 
Each  entry  has  a  one  byte  entry  code  followed  by  four  null 
terminated  strings.  The  first  string  is  the  item  name  and 
the  rest  are  privacy  keys  and  may  be  null.  The  privacy  keys 
are  for  store/  get  and  modify.  The  remainder  of  the  item 
entry  varies  depending  on  the  entry  code  specified  below. 
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(1)  Atomic  Item  (Code  0).  No  'further  fields 
exist  in  the  item  entry  for  an  atomic  item. 

(2)  Vector  (Code  1).  A  one  byte  number  indi- 
eating  the  number  of  occurences  of  the  item  follows  the 
orivacy  keys. 

(3)  Repeating  Group  (Code  2).  A  pair  of  one 
byte  numbers  follows  the  privacy  keys.  The  first  number 
indicates  the  number  of  subsequent  item  entries  in  the 
repeating  group  and  the  second  indicates  the  number  of 
occurences  in  the  group. 

c .  Set  Ent  r i es  . 

The  set  entries  follow  the  record  entries.  They 
are  preceded  by  a  one  byte  number  indicating  the  number  of 
set  entries.  Each  set  entry  consists  of  four  null  ter¬ 
minated  strings.  The  first  strina  is  the  set  name  and  the 
remaining  three  are  privacy  keys  and  may  be  null.  The 
privacy  keys  are  specified  for  insertr  remove  and  find. 

2.  Open  Message  (Code  ^). 

The  function  code  is  followed  by  a  one  byte  moder 
which  uses  the  same  encodina  as  the  C  dbopen  function  (see 
Appendix  A,  Section  B.2.b).  The  remainder  of  the  message 
consists  of  one  byte  area  index  numbers.  No  area  numbers 
should  be  included  if  every  area  known  to  the  subschema  is 
to  be  opened. 
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3.  Close  Message  (Code  1) 


The  function  code  is  followed  by  a  list  of  one  byte 
area  index  numbers.  No  area  numbers  are  included  in  the 
message  if  all  open  areas  are  to  be  closed. 

4.  Find  Message  (Code  3). 

The  function  code  is  followed  by  a  one  byte  selec” 
tion  tyoe  and  selection  codes.  The  possible  selection  tynes 
and  their  corresponding  record  selection  codes  areJ 

a.  Explicit  Key. 

Code  zero  indicates  direct  access  and  the  selec¬ 
tion  code  will  be  a  four  byte  data  base  key. 

b.  Owner  Record. 

Code  one  indicates  selection  of  the  owner  record 
for  the  set  of  the  specified  type  that  the  soecified  record 
belongs  to.  The  first  selection  code  is  a  one  byte  set 
index  and  the  second  is  a  four  byte  data  base  key. 

c.  Relative  Area. 

Tyoe  code  two  specifies  relative  selection  in 
the  designated  area.  The  first  selection  code  is  a  one  byte 
criterion  with  zero  t  or\e  t  two  and  three  meaning  next/  previ¬ 
ous/  first  and  last/  respectively?  and  four/  five/  six  and 
seven  meaning  next/  previous/  first  and  last  of  a  soecified 
record  tyoe.  If  the  criterion  is  four  throuah  seven/  the 


second  selection  code  is  a  one  byte  record  index.  The  last 
selection  code  is  a  four  byte  data  base  key. 

d.  Relative  Set. 

Tyoe  code  three  soecifies  relative  selection  in 
the  designated  set.  The  first  selection  code  is  a  one  byte 
set  index.  The  remaining  selection  codes  are  identical  to 
those  for  tyoe  code  two. 

e.  CALC  Key. 

Tyoe  code  four  indicates  hash  key  selection. 
The  first  selection  code  is  a  one  byte  record  index.  The 
remaining  codes  are  item  triples  for  all  the  items  of  the 
specified  record  tyoe  which  are  know  to  the  sub-schema  and 
are  not  associated  with  an  "OCCURS”  clause.  An  item  triple 
consists  of  an  item  specification,  a  one  byte  data  type 
code,  and  a  data  value  of  the  specified  type.  An  item 
specification  code  consists  of  a  one  byte  item  index  fol¬ 
lowed  by  zero  or  more  one  byte  subscript  values  as  aopropri- 
ate.  The  data  tvpe  codes  are  one  for  inteoer,  two  for  sin¬ 
gle  precision  floating  ooint,  three  for  double  precision 
floating  ooint,  four  for  null  terminated  string  and  five  for 
data  base  key. 

f.  Duplicate  CALC  Key. 

Type  code  five  indicates  selection  of  the  next 
record  with  a  hash  key  duplicating  the  hash  key  of  the 
specified  record.  The  selection  code  is  a  four  byte  data 
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base  key 


g.  Current  Set  Data  Value. 

Type  code  six  indicates  tHat  the  first  record  of 
the  specified  type  which  matches  the  specified  item  values 
in  the  specified  set  occurence  is  to  be  selected.  The  first 
selection  code  is  the  one  byte  record  type  index  of  the 
record  type  to  be  selected.  The  second  selection  code  is  a 
one  byte  set  type  index.  The  third  selection  code  is  a  four 
byte  data  base  key  of  a  record  which  belongs  to  the  set 
occurence  to  be  scanned.  The  remaining  selection  codes  are 
zero  or  more  item  triples. 

h.  Selected  Set  Data  Value. 

Type  code  seven  indicates  that  the  first  record 
of  the  specified  type  which  matches  the  specified  item 
values  in  the  set  occurence  selected  through  the  specifed 
record  type's  member  subentry  "SELECTION"  clause  is 
selected.  The  Hrst  two  selection  codes  are  identical  to 
those  in  the  orecedino  oaraoraph. 

The  third  selection  code  is  a  one  byte  number 
which  indicates  the  number  of  oath  selection  codes  which 
follow.  The  oath  selection  codes  are  item  Quadruples.  An 
item  Quadruple  consists  of  a  one  byte  record  index,  an  item 
specification,  a  one  byte  data  type  code,  and  a  data  value 
of  the  type  specified  in  the  data  tyoe  code.  The  remaining 
selection  codes  are  zero  or  more  item  triples  for  items  in 
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the  specified  record  tyoe, 

i.  Current  Set  Duolicate  Value. 

Type  code  eight  indicates  that  the  record  to  be 
selected  is  the  next  record  (if  any)  which  is  of  the  same 
type  as  the  specified  record?  in  the  set  of  the  specified 
type?  and  matches  the  soecified  record  in  the  specified 
items.  The  first  selection  code  is  a  four  byte  data  base 
key.  The  second  selection  code  is  a  set  type  index.  The 
remaining  selection  codes  are  item  specifications. 

5.  Get  Message  (Code  5). 

Following  the  function  code  is  a  four  byte  data  base 
key.  The  remainder  of  the  message  consists  of  item  doubles 
for  the  items  the  user  program  desires.  An  item  double  con¬ 
sists  of  an  item  specification  and  a  one  byte  data  type.  An 
omitted  subscript  in  a  data  specification  means  every 
occurence  of  the  vector  or  repeating  group  is  desired.  A 
double  for  a  repeating  arouo  has  a  data  type  of  zero.  The 
doubles  for  the  elements  in  the  repeating  arouo  must  be 
immediately  following  the  reoeatina  group’s  double. 

6.  Store  Message  (Code  12). 

The  remainder  of  the  message  is  a  one  bvte  record 
t  ype  index. 
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7,  ResDonse  to  Request  for  Data  Message  (Code  100). 

This  message  cpovides  data  requested  in  a  Request 
for  Data  Messaoe  sent  by  the  schema  DBM.,  The  function  code 
is  followed  by  an  item  specification  for  each  requested  item 
consisting  of  a  one  byte  record  type  index»  a  one  byte  item 
index  and  a  one  bvte  data  type.  The  requested  data  follows 
the  item  type  specifications  in  exactly  the  same  order  as  in 
the  Request  for  Data.  For  oroup  itemsf  the  order  of  appear¬ 
ance  of  the  subordinate  items  of  the  group  in  the  item 
specifications  is  the  order  the  items  must  appear  within 
each  occurence  of  the  group  item  in  the  data  portion  of  the 
message.  The  other  data  may  be  area  or  set  type  indices  or 
data  base  keys. 

3.  Insert  Message  (Code  7). 

The  function  code  is  followed  by  a  four  byte  data 
base  key  of  the  record  to  be  inserted.  The  remainder  of  the 
message  consists  of  set  specifier  pairs.  A  set  specifier 
pair  consists  of  a  one  byte  set  type  index  followed  by  a 
four  byte  data  base  key  indicating  the  current  record  in  the 
current  occurence  of  the  set  of  the  type  specified. 

9.  Remove  Message  (Code  11). 

Following  the  function  code  is  a  four  byte  data  base 
key.  The  remainder  of  the  message  is  zero  or  more  one  byte 
set  type  indices. 
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10.  Modify  Message  (Code  8) 


Following  the  function  code  is  a  one  byte  number 
indicating  how  many  set  membershios  are  being  modified^  fol” 
lowed  by  the  reguired  number  of  set  specifier  pairs.  The 
remainder  of  the  message  consists  of  zero  or  more  item  tri¬ 
ples. 

11.  Delete  Message  (Code  2). 

Following  the  function  code  is  a  four  byte 
base  key.  The  remainder  the  message  is  a  one  byte 
tion  code  with  the  same  values  as  for  the  parameter  of 
"delete"  function  (see  Appendix  A,  Section  B.2.i). 

B.  Messages  Transmitted  by  the  Schema  D8M. 

Messages  transmitted  are  in  response  to  messages 
received  and  fall  into  two  categories:  normal  responses  and 
error  messages.  The  first  byte  of  the  message  is  a  response 
code  and  is  zero  for  normal  responses  and  eoual  to  the  error 
code  for  error  messages.  The  format  of  the  responses^  after 
the  first  byte»  varies  depending  on  the  previously  received 
message  (for  normal  responses)  or  on  the  error  type  (for 
error  messages).  These  formats  are  detailed  below. 

1.  Normal  Responses. 


data 
de  1  e- 
t  he  C 
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a.  Initial  Call  Message. 

The  normal  response  to  an  initial  call  is  an 
encoded  schema  description  for  the  user.  This  descriotion 
is  a  series  of  one  byte  numbers  each  of  which  represents  the 
index  number  associated  with  the  data  base  names  (except  the 
schema  name)  in  the  Initial  Call  Message.  For  areas#  this 
is  the  index  number  of  the  area#  for  records#  the  index 
number  of  the  record  followed  by  the  index  number  of  each 
data  item  or  data  aaqregate#  for  sets#  the  set  number. 

b.  Find  Message. 

The  normal  response  to  a  Find  Message  contains 
the  information  necessary  to  establish  currency  for  the 
selected  record.  This  information  consists  of  a  four  byte 
data  base  key;  a  one  byte  record  tyoe  index  indicating  the 
type  of  the  record#  and  zero  or  more  one  byte  set  type 
indices  indicating  the  set  types  for  the  set  occurences#  in 
which  the  record  oa r t i c i oa t es . 

c.  Store  Message. 

The  normal  response  to  a  Store  Message  is  a 
Request  for  Data  Message.  This  message  requests  the  data 
needed  to  perform  the  store  function.  The  message  is  com¬ 
posed  of  request  entries  each  prefixed  by  a  one  byte  request 
type  code.  The  request  entry  formats  are  listed  below  alonq 
with  their  request  tyoe  codes. 


(1)  Data  Item  Reauest  (Code  0).  This  reauest 
entry  consists  o^  an  item  soec i f i c a t i on  ,  A  reoeatinq  qrouo 
index  is  an  implied  request  for  all  subordinate  elements  in 
the  repeatinq  group  known  to  the  sub-schema.  If  a  subscript 
is  missinq/  the  data  in  all  occurences  of  the  relevant  ele¬ 
ment  is  requested. 

(2)  Area  Index  Request  (Code  1).  This  request 
consists  of  the  request  code  alone.  It  requests  the  con¬ 
tents  of  "areaid"  in  the  user  program, 

(3)  Data  Base  Key  (Code  2).  This  request  con¬ 
sists  of  the  request  code  alone.  It  requests  the  data  base 
key  associated  with  "keyname"  in  the  user  program. 

( )  Current  of  Set.  This  request  consists  of  a 
set  type  index.  It  requests  the  data  base  key  of  the 
current  record  of  the  SPeciHed  set  type. 

d.  Request  for  Data  Message. 

The  normal  response  to  a  Request  for  Data  is 
identical  to  that  for  a  find.  The  record  information  passed 
is  for  the  record  just  stored. 

e.  Messaoes  with  a  Null  Response. 

The  normal  response  to  certain  messages  is  a 
resoonse  code  only.  These  messages  are  Ooen/  Close/  Insert/ 


Moaify»  Remove  and  Delete 


2.  Error  Messaoes. 

a.  Invalid  Sub-schema  (Code  60). 

This  response  is  the  result  of  a  mismatch 
between  the  schema  and  the'  sub-schema  oresented  in  the  user 
program's  Initial  Call  Message.  After  the  error  code  is  a 
one  byte  first  error  type:  zero  for  schema  entry?  one  for 
area  entry?  two  for  record  entry?  and  three  for  set  entry. 
Following  this  code  are  three^  one  byte  entries  giving  the 
number  of  the  first  erroneous  entry  in  area^  record  and  set 
entries  respectively. 

b.  Area  Already  Ooen  (Code  28). 

This  response  to  an  Open  Message  has  a  one  byte 
error  count  following  the  response  code. 

c.  Truncation  of  Data  (Code  54). 

This  response  to  a  Get  Message  has  identical 
format  to  the  normal  resoonse  to  a  Get  Message  except  for 
the  response  code. 

d.  Messages  with  Error  Code  Only.  The  remaining 
error  responses  consist  of  an  error  code  only  as  follows. 


(1)  Data  Base  Key  Invalid  (Code  2) 


a) . 


(Code 


(Code 


la) . 


(Code 


(Code 


(2)  Data  Items  Invalid  op  Inconsistent  (Code 

(3)  Violation  of  DUPLICATES  NOT  ALLOWED  clause 

5)  . 

(y)  End  of  Set  or  Area  (Code  7). 

(5)  Invalid  Record  or  Set  Index  (Code  8). 

(6)  Attempted  Update  on  Retrieval  Only  Area 

9)  . 

(7)  Privacy  Breach  Attempted  (Code  10), 

(8)  '^edia  Space  not  Available  (Code  11). 

(9)  Data  Base  Key  not  Available  (Code  12). 

(10)  Insert  into  Mandatory  Automatic  Set  (Code 

(11)  Remove  out  of  Mandatory  Set  (Code  15), 

(12)  Insert  into  Set  with  Existing  Membershio 

16)  . 

(13)  Implicitly  Referenced  Area  not  Available 

18)  . 

(1^)  Affected  Area  not  Ooen  (Code  21). 
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(15)  Illegal  Area  Index  (Code  23) 


(l(j)  Set  Occurence  would  Scan  Temporary  and 
Permanent  Areas  (Code  2^). 


(17) 

Mo  Set  Occurence  Satisfies  Specified  Arqu- 

ment  s  (Code  25 )  . 


(18) 

Mo  Record  Satisfies  Record  Selection 

Expression  (Code 

26) . 

(19) 

CHECK  Clause  Violated  (Code  27), 

(20) 

Usaoe  Mode  Conflict  with  Other  Processes 

(Code  29). 


(21  ) 

UnauaHfied  DELETE  on  Owner  of  a  Non-empty 

Set  (Code  30 ) . 


(22) 

No  Initial  Call  Message  (Code  bl). 

(23) 

Indecipherable  or  Uno rocc essab 1 e  Message 

( Code  100) 


APPENDIX  F.  DBM  SKELETON  PROGRAM. 


The  skeleton  program  is  identical  for  every  schema  DBM, 
Compiled  schema  DBM's  differ  only  in  the  values  associated 
with  certain  DEFINE'd  constants  controlling  array  sizesr  in 
the  initialization  of  certain  arrays#  and  in  the  data  base 
procedures  which  are  included  in  the  compiled  version.  When 
the  schema  DBM  is  executed#  it  initializes  its  tables  from 
the  schema  description  in  the  Schema  Description  File. 
These  tables  drive  the  processing  of  the  data  base.  The 
schema  DBM  concurrently  reads  the  user's  sub-schema  descrip¬ 
tion  from  the  interprocess  communication  pipe.  The'  sub¬ 
schema  description  is  validated  and  index  numbers  are  pro¬ 
duced  to  allow  translation  of  user  requests  and  data  into 
system  requests  and  data. 

This  Appendix  describes  the  data  organization  of  the 
skeleton.  Documentation  for  each  service  routine  and  util¬ 
ity  routine  is  contained  in  the  source  program  listings. 
Listings  and  machine  readable  source  of  the  DBM  skeleton  can 
be  obtained  by  contacting  the  Department  of  Computer  Science 
(Code  52Rs).  For  an  explanation  of  the  values  associated 
with  DEFINE'd  constants  mentioned  in  this  Appendix  see 
Appendix  H.  For  a  description  of  a  user's  view  of  the  ser¬ 
vice  routines#  see  Appendix  A, 
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A.  General  Tables 


Certain  tables  and  buffers  are  available  for  use  by 
other  tables, 

1,  The  character  buffer,  "scharbuf”,  is  dimensioned  by  a 
DEFINE'd  constant.  It  is  used  to  stored  character  strings 
and  other  relatively  short  variable  length  data. 

2,  "Procname"  is  an  array  of  strings  containing  the 
names  of  the  data  base  orocedures.  "Procpoint"  is  an  array 
of  function  pointers  pointing  to  the  functions  defined  in 
"orocname".  These  arrays  are  used  to  set  uo  the  data  base 
procedure  pointers  used  in  other  tables.  Both  "procname" 
and  "procpoint"  are  dimensioned  by  a  DEFINE'd  constant, 

3,  The  orivacy  vector  array  ("ovec")^  dimensioned  by  a 
DEFINE'd  constant,  is  used  for  data  item  and  data  agoregate 
orivacy  information.  Its  elements  are  structures  of  type 
"orivect".  The  format  of  a  "orivect"  structure  is 

struct  orivect  { 

char  Ptype»  //  tyoe  of  privacy  lock 

char  *olock;  //  oointer  to  privacy  lock 

} 

The  "ptype"  code  is  "s"  if  the  "plock"  oointer  ooints  to  a 

string,  and  "o"»  if  a  data  base  procedure  is  indicated, 

A,  The  record  buffer  array,  "srecbuf"»  contains  all  the 
record  buffers  for  areas,  records  and  sets. 
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B.  Organization  for  Area  Management 


The  schema  DBM  contains  an  array  of  structues  of  type 
"areavect''»  dimensioned  by  a  DEFINE'd  constantf  called 
"avec”.  Each  structure  in  the  array  is  used  to  describe  an 
area  in  the  data  base.  The  format  of  an  "areavect"  struc¬ 


ture  is  as  follows: 


struct  areavect  { 

i  nt  a  f 1 ags •  // 

intause#  // 

char  * ada t aoa t h ? / / 
int  adatades/  // 
char  ♦akeyoath;  // 
int  akevdes;  // 
char  acrecloc[3J; 
char  *acurrec:  // 
i  nt  acur key  f 21  »  // 


see  below 

usage  count  of  last  reference 
pointer  to  path  to  data  file 
file  descriptor 
pointer  to  oath 
file  descriptor 
//  1 ocat i on  of 
pointer  to  current  rec  buffer  _ 
first  key  if  in  current  key  buff 
//  buffer  for  db  key 


for  data  file 
to  key  file 
for  key  file 
current  record 


maop 1 ngs 


char  akeybuf  1768] 
int  (  *p ro vec ) ( )  r 1 ^ ]  » //  pointers  to  db  procedures 
int  apflags?  //  permission  flags  for  functions 
int  awaste:  //  current  waste  count 

} 


"Aflags"  is  formed  by  a  bit-wise  OR  of  the  following 


octal  codes: 


TEMP 

0100000 

The  area  is  temoorary 

PHSOP 

oaoooo 

The  files  are 

physical  1 y 

Open 

KETBMOO 

020000 

Current  key  block  modifi 

ed 

CRECMOD 

010000 

Current  record 

mod  1 f i ed 

KEYBVAL 

oaooo 

Key  block  is  valid 

CRECVAL 

02000 

Record  buffer 

is  valid 

CRECSIZ 

01000 

Current  record 

has  increased  size 

KNOWN 

0000 

Area  is  known 

to  the  sub 

-schema 

RETRV 

01 

Area  open  for 

retrieval 

PRETRV 

02 

Area  ooen  for 

prof  ected 

retrieval 

ERETRV 

03 

Area  open  for 

exclusive 

retrieval 

UPDT 

oa 

Area 

Open 

for 

update 

PUPOT 

05 

Area 

open 

for 

protected 

upda  t  e 

EUPOT 

06  ' 

Area 

open 

for 

exclusive 

uoda  t  e 

The  "aoflags"  are  set  when  the  user  sub-schema  is  vali¬ 
dated.  These  flags  indicate  the  functions  allowed  the  user 


the  area 

as  follows: 

0100000 

Retrieval 

oaoooo 

Protec  ted 

retrieval 

020000 

Exclusive 

retrieval 

010000 

Update 

oaooo 

Protected 

update 

02000 

Exclusive 

uoda  t  e 

C,  Logical  Usage  Block. 

The  logical  usaae  block  records  the  current  usage  mode 
for  each  area  in  the  data  base  currently  being  used  by  any 
schema  DBM.  The  loaical  usaae  block  is  organized  into  two 
byte  integer  entrieSf  one  for  each  area  in  the  data  base. 
Each  two  byte  entry  is  divided  into  four  fields:  bit  15  is 
the  exclusive  use  bit?  bits  through  ten  form  a  count  of 
retrievers?  bits  nine  through  five»  a  count  of  orotected 
retrievers?  and  bits  four  through  zero  form  a  count  of 
uodaters.  The  loaical  usaae  block  can  record  uo  to  31  users 
in  each  category.  If  a  schema  08M  has  an  area  ooen  for  a 
protected  modef  the  updater  count  is  set  to  31. 
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D,  Organization  for  Record  Management. 

The  schema  08^  contains  an  array  of  structures  of  type 
"recvector**.  Each  element  of  the  array  is  the  record  vector 
for  a  record  type  defined  in  the  schema.  The  format  of  a 
"recvector”  structure  is: 


struct  recvector  { 

int  rflagsJ  //  see  below 

char  rlocmod[3];  //  see  below 
char  rarea[3]?  //  see  below 

int  ( * rp r o vec ) ( ) [ 1 ^ ?  //  pointers  to  db  procedures 
char  rnumsets?  //  number  of  set  types  for  record 
struct  member  ^rsets?  //  pointer  to  member  vectors 
char  rnumitem;  //number  of  items  in  record 
struct  itemvect  ^ritemsr  //  pointer  to  item  vectors 
int  rpflags/  //  permission  flags  for  functions 
char  ^rcurrec?  //  pointer  to  current  record  buffer 
) 


’•Rflags”  is  currently  used  only  to  indicate  whether  or  not 
the  sub-schema  knows  about  the  record  type.  The  octal  code 
KNOWN  (0400)  is  used  for  this  function. 


'•Rlocmod”  is  derived  from  the  LOCATION  clause  of  the 
schema  RECORD  entry  (see  Section  3.3.4,  of  Ref.  2)  and  is 
interpreted  as  follows.  Character  zero  gives  the  location 
mode:  zero  for  DIRECT  with  key  passed  as  a  parameter;  one 
for  DIRECT  with  key  stored  in  a  record;  two  for  CALC  using 
the  standard  key  transformation  with  no  duplicates;  three 
for  CALC  using  the  standard  key  transformation  with  dupli¬ 
cates  allowed;  four  for  CALC  using  a  data  base  procedure 
with  no  duplicates;  Mve  for  CALC  using  a  data  base  pro¬ 
cedure  with  duplicates  allowed;  six  for  VTA  a  set;  and  seven 
for  SYSTEM  mode.  The  last  two  bytes  in  the  ’’riocmod*’  vary 


in  meaning  depending  on  the  mode 


For  mode  zeror  bytes  one 


and  two  are  unused.  For  mode  one^  byte  one  contains  the 
record  type  and  byte  two#  the  item  index  of  the  data  item 
which  holds  the  data  base  key*  For  modes  two  throuqh  five# 
bytes  one  and  two  hold  a  pointer  to  the  randomizing  key 
description*  A  randomizing  description  is  a  null  terminated 
series  of  bytes#  the  Hrst  containing  the  item  index  of  the 
key  link  item  and  subsequent  bytes  containing  the  item 
indices  of  the  Helds  of  the  randomizing  key*  For  modes 
four  and  five#  the  randomizing  key  description  is  headed  by 
a  pointer  to  the  appropriate  data  base  procedure*  For  mode 
six#  byte  one  contains  the  set  index  of  the  set  to  be  con¬ 
sulted  and  byte  two  is  unused* 

*’Rarea”  is  derived  from  the  VvTTHIN  clause  of  the  schema 
RECORD  entry  and  is  formatted  as  follows*  Byte  zero  is  the 
WITHIN  option  code  and  has  the  following  interpretation: 
zero#  all  records  are  within  a  single  area#  one#  multiple 
areas  are  possible  (selected  by  a  user  input  value)#  two# 
the  area  will  be  the  area  of  the  owner  of  the  set  of  a 
specified  type  in  which  the  record  participates*  The  values 
of  bytes  one  and  two  of  '*rarea'*  vary  depending  on  the  WITHIN 
option:  for  zero  and  two#  byte  one  contains  an  area  index 
number  and  byte  two  is  unused#  for  one#  bytes  one  and  two 
contain  a  pointer  to  a  WITHIN  criteria*  A  WITHIN  criteria 
is  a  null  terminated  string  of  bytes  each  containing  one  of 
the  allowed  area  numbers  for  this  record* 
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1  .  Member  Vectors 


Each  record  vector  contains  a  pointer  ("rsets")  to 
an  array  of  set  membership  vectors  if  it  participates  in  any 
sets.  A  set  membership  vector  is  a  structure  of  type 
"member".  The  format  of  the  "member"  structure  is: 

struct  member  < 


char 

msetnum? 

// 

set  index 

for 

this  entry 

i  n  t 

m  f 1 ags  » 

// 

see  below 

i  nt 

mo  rde  r ; 

/* 

flag  bits 

for 

key 

Item 

0  =  ascend! nq/ 

1 

=  descend i no* / 

char 

*mokey  * 

// 

pointer  to 

i  terns 

for  prime  key 

char 

★★mskey / 

// 

pointer  to 

SEARCH 

1 ndex  DO! nters 

char 

mse 1 f 1 ag / 

// 

root  selection 

f  1  ag 

i  nt 

*mse 1  id? 

// 

pointer  to 

root  selection  id 

i  n  t 

(*msse1)()/ 

// 

pointer  to 

set 

selection  soec 

i  n  t 

( *mo rovec ) ( )  [61 

;//  pointer  to 

db 

procedures 

char 

mol f  ags / 

// 

permission 

flags 

for  functions 

> 

The  "mflaqs"  for  a  member  entry  is  formed  by  a  bit¬ 
wise  OR  of  the  followino  codes: 


MMANO 

0100000 

Member sh i p 

is  mandatory 

MAUTO 

oaoooo 

Membe  r sh i p 

i s  automat i c 

ML  INK 

oaoooo 

Member  i s 

1 i nked  to  owner 

MSSEL 

010000 

Set  selection  by  db  procedure 

MPKEY 

oaooo 

Primary  key  is  defined 

MPRKEY 

02000 

RANGE  option 

MPFKEY 

01000 

Dup 1 i c  a t  es 

first 

MPLKEY 

oaoo 

Dupl i cat  es 

last 

MPOKEY 

0200 

Dup 1 i c  a  t  es 

arbitrary 

MPNKEY 

0100 

Nulls  allowed 

MKNOWM 

oao 

Membe  rs  h i d 

is  known  to  sub-schema 

Whenever  any  of  MPRKEY  throudh  MR M KEY  are  set/  MPKEY  must  be 
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set.  Additionally,  the  low  order  three  bits  of  "mflags" 
contain  the  number  of  secondary  indices  defined  to  support 
SEARCH  keys  for  this  membership. 

The  pointer  "mskey"  points  to  a  strinq  of  pointers 
dimensioned  by  the  count  stored  in  "mflags".  Each  pointer 
points  to  a  null  terminated  string.  The  first  byte  of  this 
string  indicates  whether  duplicates  are  allowed,  the  second 
byte  is  the  item  index  for  the  owner  record  item  linking  the 
search  index;  and  the  remaining  bytes  are  item  indices,  each 
representing  a  Held  in  the  SEARCH  key  for  the  search  index. 
Duplicates  are  allowed  if  the  first  byte  of  the  string  is  a 
one  and  not  allowed  if  it  is  a  two. 

"Mpflags"  is  set  when  the  sub-schema  is  validated. 
These  flags  indicate  the  functions  allowed  the  user  for  the 
record/set  pair  as  follows: 

0200  Insert 

0100  Remove 

OaO  Find. 

When  "mflags"  has  MSSEL  set,  "mssel"  is  a  pointer  to 
a  data  base  procedure  for  set  selection  and  "mselflag"  and 
"mselid"  are  unused.  If  MSSEL  is  not  set,  "mselflao"  is  a 
code  describing  the  root  set  selection  in  the  set  selection 
chain  for  this  member.  The  possible  values  of  "mselflag" 
are  as  follows:  one  for  singular  sets;  2  for  current  of  set 
type;  three  for  throuoh  data  base  key;  and  four  for  through 
CALC  key.  The  data  in  "mselid"  depends  on  the  value  of 
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"mselflag".  For  singular  sets  and  current  of  set  selection^ 
"mselid”  contains  the  set  index  of  the  root  set.  For  selec¬ 
tion  by  data  base  key.  "mselid"  is  not  used.  For  selection 
by  CALC  key.  "mselid"  is  a  oointer  to  a  null  terminated 
string  of  character  pairs  which  are  the  record  and  item 
indices  of  the  items  to  be  used  in  forming  the  CALC  key. 

When  set  selection  is  not  by  a  data  base  procedure 
and  the  number  of  THRU  clauses  in  the  SELECTION  clause  for 
this  member  entry  is  greater  than  one.  then  a  selection 
chain  exists  and  "mssel"  is  a  pointer  to  a  null  terminated 
string  of  byte  pairs.  Each  pair  in  this  string  describes 
the  set  selection  for  one  of  the  successive  set  types  in  the 
set  selection  chain.  Each  oair  consists  of  the  index  of  the 
next  set  in  the  chain  and  the  index  of  the  data  item  which 
must  be  matched  in  the  owner  record. 

2.  Item  Vectors. 

Each  record  contains  a  oointer  ("ritems")  to  an 
array  of  item  description  vectors.  Each  element  of  the 
array  is  a  structure  of  tyoe  "itemvect"  and  describes  one  of 
the  fields  appearing  in  the  record.  A  field  may  be  in  a 
record  for  CALC  key  linkage,  for  set  linkage  or  as  a  result 
of  a  data  sub-entry  in  the  record’s  source  description.  The 
format  of  an  "itemvect"  structure  is: 
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struct  itemvect  { 


char  * i name  7  / / 

char  i level;  // 

char  i  type/  // 

char  ★ides?  // 

char  *  i  chec  k  ;  // 

i  n  t  i  s  i  2  e ;  /  / 

int  inocc;  // 

int  isbyteJ  // 

int  ( * i orovec ) ( )  [6] 
struct  or i vector  *i 
char  ipflags?  // 

> 


oointer  to  name  of  item 
item  level 

type  of  data  represented 
data  description  pointer 
oointer  to  validity  check 
size  of  one  occurence  of  item 
number  of  occurences  of  item 
starting  byte  within  record 
;//  pointers  to  db  procedures 
ovec ; //  po i nt er  to  privacy  locks 
item  orivacy  flaas 


The  "i level”  entry  specifies  the  level  number  of  the 


item.  A  level  number  beween  one  and  100  indicates  the  item 


was  generated  by  an  item  sub-entry;  level  lOlf  a  forward 


chain  link  for  a  set;  102^  a  backward  link;  103^  a  link  to 
owner;  lOA^  an  owner's  link  to  first  member;  105»  an  owner's 
link  to  last  member;  106^  an  owner's  link  to  index;  and  107/ 


a  CALC  synonym  link. 


”Itype"  is  the  data  type  code  for  the  item:  zero  for 
repeating  groups;  one  for  a  PICTURB'd  character  string;  two 
for  a  PICTURB'd  numeric  string;  three  for  a  binary  integer; 
five  for  a  single  precision  floating  point  number;  six  for  a 
double  precision  floating  point  number;  seven  for  a  charac¬ 
ter  string;  eight  for  a  bit  strino;  and  nine  for  a  data  base 
ke y . 

If  the  item  is  a  set  link/  "ides"  is  the  set  index 
for  the  set.  If  the  item  has  a  oicture  specified/  "ides"  is 
a  pointer  to  a  character  string  containing  the  oicture  (see 
Ref.  2/  Section  3.3.8  for  a  descriotion  of  PICTURB'd  data). 
In  other  cases/  "ides"  is  unused. 
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"Icheck”  is  a  do  inter  to  a  validity  checking 
description  for  the  item.  The  first  character  in  the 
description  is  a  flag  byte.  If  the  high  order  bit  of  the 
flag  byte  is  on»  the  picture  is  used  as  a  check.  If  bit  six 
is  on/  a  data  base  procedure  is  used  as  a  check.  If  bit 
five  is  on/  check  values  are  used.  If  a  data  base  procedure 
is  specified/  a  pointer  to  the  procedure  is  stored  immedi¬ 
ately  after  the  flag  character.  If  check  values  are  speci¬ 
fied  they  are  stored  at  the  end  of  the  validity  checking 
description.  Check  values  consist  of  a  series  of  check 
entries  seperated  by  ASCII  comma  characters  and  terminated 
by  a  null  byte.  Each  check  entry  is  either  a  literal  of  the 
same  format  as  the  item  or  a  pair  of  such  literals  separated 
bv  an  ASCII  dash  character. 


The 

dated.  The 
0200 
0100 
040 


"ipflags''  are  set  when  the  sub-schema  is 
octal  codes  and  function  permissions  are: 
Store  is  permitted 
Get  is  permitted 
Modify  is  permitted 


va  1  i  - 


E.  Organization  for  Set  Management. 

The  schema  OBM  program  contains  an  array  of  set  vectors. 
Each  set  vector  describes  one  of  the  set  types  defined  in 
the  schema  and  is  a  structure  of  tyoe  "setvect''.  The  format 
of  a  "setvect"  structure  is: 
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struct  setvect  ( 

int  sflags?  //  see  below 

cbar  sownerJ  //  owner  record  type 

char  sfitem?  //  index  of  1st  item  for  set 

char  *smemb»  //  oo inter  member  description 

int  ( *sprovec ) ( )  [yj  ;  //  pointers  to  db  functions 
char  spflags?  //  function  permission  flags 
char  scurownl3] !  //  db  kev  of  current  owner  rec 
char  *scurrec;  //  do  inter  to  current  record  buf 
> 


The  value  of  "sflags"  is  formed  by  a  bit-wise  OR  of  the 
following  octal  codes; 


KNOWN  oaoo 
SYSTEM  0200 
DYNAMIC  0100 
PRIOR  oao 
INDEXED  020 


Set  tyoe  is  known  to  sub-schema 
S i nqu 1  a r  set 
Dynamic  set  type 
Members  contain  backward  links 

I 

Primary  set  order  is  via  an  index 
The  lower  four  bits  of  "sflaas"  indicate  the  order  criteria 
for  the  set;  2ero»  the  order  is  immaterial;  one»  new  records 
are  inserted  on  the  front  of  the  set;  two»  new  records  are 
inserted  at  the  end  of  the  set;  three/  new  records  are 
inserted  after  the  current  record  of  the  set;  four,  new 
records  are  inserted  prior  to  the  current  record;  five 
through  11/  a  sorting  order.  Five  indicates  sorted  by  data 
base  key;  six/  sorted  by  record  names  and  then  by  member 
keys;  seven/  sorted  by  the  member  record  keys  with  relation¬ 
ship  between  records  of  different  types  immaterial;  and 
eight  through  11  indicate  sorted  by  member  keys  (this 
implies  that  the  format  of  each  member  record's  keys  is  the 
same).  The  last  four  codes  soecify  duolicate  orocessing; 
eight/  duplicates  are  allowed;  nine/  duplicates  are  first; 
ten/  duplicates  are  last;  and  11/  duplicates  are  not 
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allowed.  Items  in  the  owner  record  dealing  with  the  set  are 
assumed  to  be  stored  contiqously. 

”Smemb*^  points  to  a  null  terminated  string  of  bytes 
indicating  the  member  record  types  for  the  set.  The  string 

is  made  up  of  three  byte  entries.  The  first  byte  is  the 
member  record  index?  the  second  is  the  membership  vector 
index  of  the  member  record  for  this  set?  and  the  third  is 
the  item  vector  index  of  the  first  item  in  the  record  deal¬ 
ing  with  this  set.  A11  items  having  to  do  with  the  set  are 
assumed  to  be  stored  contiguously  in  the  member  records. 

The  *^spflags*'  are  set  when  the  sub-schema  is  validated. 
These  flags  indicate  the  function  allowed  the  user  for  the 
set  : 

0100  Insert  is  allowed 

0^0  Remove  is  allowed 

020  Find  is  allowed 
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APPENDIX  G.  DIFFERENCES  IN  THE  SCHEMA  DDL. 


This  Aopendix  gives  a  detailed  description  of  the 
differences  between  the  DDL  in  the  UNIX  DBMS  and  that 
described  in  Ref.  2.  The  Appendix  is  organized  in  parallel 
with  Section  5  of  Ref.  2  and  section  references  below  are 
sections  in  Ref.  2  unless  otherwise  no^ed.  The  meta- 
1 anguage  used  to  describe  entries  is  identical  to  that  of 
Ref.  2  with  the  exceptions  that  no  distinction  is  made 
between  reguired  or  optional  words  and  that  options  enclosed 
in  brackets  are  separated  bv  virqules  ('*/")  in  lieu  of  being 
on  separate  lines. 

A.  Words. 

The  rule  in  section  3.0.3  for  forming  words  applies  to 
the  DDL.  However/  when  validating  a  sub-schema/  the  DBMS 
considers  upper  and  lower  case  letters  to  be  equivalent  and 
considers  underscore  ("<■")  a  synonym  for  hyphen 

B.  Schema  Entry  (Section  3.1.0). 

The  "ON  (ERROR  DURING! '*  clause  is  not  supported.  In  the 
"PRIVACY  LOCK"  clause/  onlv  the  "(FOR  COPY!"  ootion  is  sup¬ 
ported.  Specifying  alternate  privacy  locks  for  the  same 
function  is  not  supoported. 
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C.  Area  Entry  (Section  3.2.0). 

Soecifyinq  alternate  privacy  locks  for  the  same  function 
is  not  supported. 

D»  Record  Sub*ent ry  (Section  3.3.0). 

In  the  "PRIVACY  LOCK"  clause^  specifying  alternate 
orivacy  locks  is  not  supported.  Specifying  a  data  base  pro¬ 
cedure  for  area  selection  in  the  "LOCATTOM  MODE"  clause  is 
not  supported. 

E.  Data  Sub-entry  (Section  3.3.0). 

In  the  "TYPE"  clause#  the  only  arithmetic  tyoes  sup¬ 
ported  are  "BINARY  FIXED",  "BINARY  FLOAT  a"  and  "BINARY 
FLOAT  8".  The  word  "BINARY"  is  assumed  if  missing,  and  "4" 
is  assumed  if  neither  "4"  nor  "8"  is  specified.  If  the 
"TYPE"  clause  uses  the  "BIT  inteqer-3"  option,  "integer-3" 
must  be  a  multiple  of  eight.  Since  all  records  must  be 
fixed  length,  the  "OCCURS  da t a-base- i den t i f i e r- 1  TIMES" 
oDtion  is  not  supported.  "RESULT"  and  "SOURCE"  items,  both 
virtual  and  actual,  are  not  supoorted.  The  "FOR 
{ENCODIMG/OECOOING} "  clause  is  not  supoorted.  Soecifyinq 
alternate  privacy  locks  on  the  same  function  is  not  sup¬ 
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F.  Set  Sub-entry  (Section  3.^l.0). 

The  "TEMPORARY"  ootion  of  the  "ORDER"  clause  is  not  sup¬ 
ported,  The  "(INDEXED  (NAME  IS  i ndex-name-ll 1 "  clause  is 
not  supported.  Specifyina  alternate  privacy  locks  for  the 
same  function  is  not  supported. 

G.  Member  Sub-entry  (Section  3,^,0), 

In  the  "RANGE  KEY"  clause/  no  more  than  sixteen  data- 
base-identifiers  may  be  specified.  The  "DUPLICATES  NOT 
ALLOWED  FOR"  clause  is  not  supported.  No  more  than  seven 
"SEARCH  KEY"  clauses  can  be  specified.  In  the  "SEARCH  KEY" 
clause/  the  "USING"  phrase  is  not  meaningful  since  all 
search  keys  are  implemented  using  indices.  In  Format  I  of 
the  "SET  SELECTION"  clause/  the  "DATA-BASE-KEY  EQUAL  TO 
data-base-identifier-1"  and  "CALC-KEY  EQUAL  TO  data-base- 
data-name-2  (dat a-base-dat a-name-3)  ..."  options  are  not 
supported.  In  the  same  clause/  the  only  form  of  the  "THEN 
THRU"  phrase  supported  is  without  the  "EQUAL  TO"  option. 
Specification  of  alternate  privacy  keys  on  the  same  function 
is  not  supported. 
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APPENDIX  H.  constant  FILE  CONTENTS 


As  mentioned  in  Section  TIT.L.W  a  constant  file  must 
be  created  by  tfie  DBMS  compiler  to  allow  the  skeleton  pro* 
gram  to  be  transformed  into  the  schema  DBM  for  a  particular 
data  base.  This  file  dimensions  the  tables  and  arrays  of 
the  schema  DBM  and  initializes  arrays  for  the  processing  of 
data  base  procedures.  The  specific  tables  and  arrays  are 
described  below, 

A,  The  Character  Buffer. 

The  character  buffer,  "scharbuf",  is  utilized  for 
storage  of  character  strings  and  several  other  types  of 
variable  length  data.  This  charact-er  buffer  must  be  large 
enough  to  contain  the  schema  name?  rhe  oath  names  to  all 
schema  files  (including  temporary  ones)?  the  item  names  of 
every  item  in  every  record?  the  primary  key,  search  key  and 
selection  data  for  every  membership  vector?  the  data  and 

I 

validity  check  descriptions  of  every  item  vector  in  every 
record?  and  the  member  record  strina  for  every  set  vector. 
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B.  Data  Base  Procedure  Table. 

This  table  is  conoosed  oT  two  arrays.  The  first  arrays 
"Drocname"»  is  an  array  of  character  strings  and  must  be 
initialized  with  the  name  of  every  data  base  procedure  men¬ 
tioned  in  the  schema.  The  second  arrays  "o rocoo i n t ” f  is  an 
array  of  function  pointers  which  must  be  initialized  to 
point  to  the  data  base  procedures  listed  in  "procname".  The 
references  to  data  base  procedures  in  "orocooint"  cause  the 
C  compiler  to  load  these  functions  into  the  schema  DBM, 

C.  Privacy  Vector  Array. 

This  array  of  structures  of  type  "privector”  must  be 
dimensioned  large  enouah  to  hold  the  maximum  number  of  item 
privacy  locks  defined  in  any  one  record  entry.  The  format 
of  a  "privector"  structure  is  described  in  Appendix  F,  Sec¬ 
tion  A .  3 , 

D.  Record  Buffer  Array, 

This  is  a  character  array  which  is  used  to  provide 
record  buffers  for  the  various  areas»  records  and  sets.  Its 
dimension  must  be  the  number  of  areas  and  sets  times  the 
maximum  record  size  plus  the  size  of  each  individual  record. 
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E.  Area  Vector  Array 


This  array  of  structures  of  type  "areavect"  must  be 
dimensioned  laroe  enough  to  provide  one  area  vector  for  each 
defined  area.  An  "areavect"  structure  is  described  in 
Appendix  F,  Section  0. 

F,  Record  Vector  Array, 

This  array  of  structures  of  tyoe  "recvector"  must  be 
dimensioned  large  enough  to  provide  one  record  vector  for 
each  defined  record  type,  A  description  of  the  "recvector" 
structure  is  contained  in  Appendix  F/  Section  D, 

G,  Member  Vector  Array. 

This  array  of  structures  of  tyoe  "member"  must  be  dimen- 
sioned  large  enough  to  provide  one  member  vector  for  every 
record  membership  defined  in  every  set.  A  description  of 
the  "member"  structure  is  contained  in  Appendix  F ,  Section 

D.  1  . 

H,  Item  Vector  Array. 

This  array  of  structures  of  type  "itemvect"  must  be 
dimensioned  large  enough  to  provide  an  item  vector  for  every 
item  in  every  record  tyoe,  A  description  of  the  "itemvect" 
structure  is  contained  in  Appendix  F,  Section  D,2. 
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I.  Set  Vector  Array 


This  array  of 
dimensioned  large 
set  defined  in  the 
structure  is  conta 


structures  of  type  ’’setvect"  must  be 
enough  to  provide  a  set  vector  for  every 
schema.  A  description  of  the  "setvect" 
n  in  Appendix  F»  Section  E. 


V 
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